Unsubscribe
.
Thanks in advance.
Regards,
Ajay
he column in the schema
> that you are using to read.
>
> Regards,
> Gourav
>
> On Sun, Jun 16, 2019 at 2:48 PM wrote:
>
>> Hi Team,
>>
>>
>>
>> Can we have another column which gives the corrupted record reason in
>> permissive mode while reading csv.
>>
>>
>>
>> Thanks,
>>
>> Ajay
>>
>
ut I believe this only pertains to
> standalone mode and we are using the mesos deployment mode. So I don't
> think this flag actually does anything.
>
>
> Thanks,
> Jeff
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Thanks,
Ajay
t;>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
--
Thanks,
Ajay
You can look at the spark master UI at port 4040. It should tell you all
the currently running stages as well as past/future stages.
On Sun, May 20, 2018, 12:22 AM giri ar wrote:
> Hi,
>
>
> Good Day.
>
> Could you please let me know whether we can see spark logical or
--
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
--
Thanks,
Ajay
tion: Can't get Master
Kerberos principal for use as renewer
sc.textFile("hdfs://vm1.comp.com:8020/user/myusr/temp/file1").collect().foreach(println)
//Getting this error: java.io.IOException: Can't get Master
Kerberos principal for use as renewer
}
}
On Mon, Nov 7, 2016
Did anyone use
https://www.codatlas.com/github.com/apache/spark/HEAD/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala
to interact with secured Hadoop from Spark ?
Thanks,
Ajay
On Mon, Nov 7, 2016 at 4:37 PM, Ajay Chander <itsche...@gmail.com> wrote:
>
> Hi Everyo
this from quite a while ago. Please let me know if
you need more info. Thanks
Regards,
Ajay
Sean, thank you for making it clear. It was helpful.
Regards,
Ajay
On Wednesday, October 26, 2016, Sean Owen <so...@cloudera.com> wrote:
> This usage is fine, because you are only using the HiveContext locally on
> the driver. It's applied in a function that's used on a Scal
Sunita, Thanks for your time. In my scenario, based on each attribute from
deDF(1 column with just 66 rows), I have to query a Hive table and insert
into another table.
Thanks,
Ajay
On Wed, Oct 26, 2016 at 12:21 AM, Sunita Arvind <sunitarv...@gmail.com>
wrote:
> Ajay,
>
>
(AsynchronousListenerBus.scala:64)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.
scala:1181)
at org.apache.spark.util.AsynchronousListenerBus$$anon$
1.run(AsynchronousListenerBus.scalnerBus.scala:63)
Thanks,
Ajay
On Tue, Oct 25, 2016 at 11:45 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
rk.sql.hive.HiveContext
and I see it is extending SqlContext which extends Logging with
Serializable.
Can anyone tell me if this is the right way to use it ? Thanks for your time.
Regards,
Ajay
t.sql("set hive.exec.dynamic.partition.mode=nonstrict")
val dataElementsFile = "hdfs://nameservice/user/ajay/spark/flds.txt"
//deDF has only 61 rows
val deDF =
sqlContext.read.text(dataElementsFile).toDF("DataElement").coalesce(1).distinct().cache()
deDF.wi
zadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Strange that Oracle table of 200Million plus rows has not been
>>>>>>> partitioned.
>>>>>>>
>>>>>>> What matter
y pointers are appreciated.
Thanks for your time.
~ Ajay
it in spark 2.0 ? I did search commits done in 2.0 branch and
looks like I need to use spark.sql.files.openCostInBytes but I am not sure.
Regards,Ajay
Thanks for the confirmation Mich!
On Wednesday, June 22, 2016, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:
> Hi Ajay,
>
> I am afraid for now transaction heart beat do not work through Spark, so I
> have no other solution.
>
> This is interesting point as with Hive
.
Regards,
Ajay
On Thursday, June 2, 2016, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:
> thanks for that.
>
> I will have a look
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
providing the table name?
Yes I did that too. It did not made any difference.
Thank you,
Ajay
On Sunday, June 12, 2016, Mohit Jaggi <mohitja...@gmail.com> wrote:
> Looks like a bug in the code generating the SQL query…why would it be
> specific to SAS, I can’t guess. Did you
I tried implementing the same functionality through Scala as well. But no
luck so far. Just wondering if anyone here tried using Spark SQL to read
SAS dataset? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Mich, I completely agree wi
mmy
> SELECT
> ID
> , CLUSTERED
> , SCATTERED
> , RANDOMISED
> , RANDOM_STRING
> , SMALL_VC
> , PADDING
> FROM tmp
> """
>HiveContext.sql(sqltext)
> println ("\nFinished at"); sqlCo
9:05,935] INFO ps(2.1)#executeQuery SELECT
"SR_NO","start_dt","end_dt" FROM sasLib.run_control ; created result set
2.1.1; time= 0.102 secs (com.sas.rio.MVAStatement:590)
Please find complete program and full logs attached in the below thread.
Thank you.
Regards,
Ajay
Hi again, anyone in this group tried to access SAS dataset through Spark
SQL ? Thank you
Regards,
Ajay
On Friday, June 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Spark Users,
>
> I hope everyone here are doing great.
>
> I am trying to read data from
---++--+
Since both programs are using the same driver com.sas.rio.MVADriver .
Expected output should be same as my pure java programs output. But
something else is happening behind the scenes.
Any insights on this issue. Thanks for your time.
Regards,
Ajay
Spark Code to
extracts
> inherently.
> But you can maintain a file e.g. extractRange.conf in hdfs , to read from
> it the end range and update it with new end range from spark job before it
> finishes with the new relevant ranges to be used next time.
>
> On Tue, Jun 7, 2016 at 8:49 PM, Ajay C
t
>>> into hdfs
>>>
>>> perhaps there is some sort of spark 'connectors' that allows you to read
>>> data from a db directly so you dont need to go via spk streaming?
>>>
>>>
>>> hth
>>>
>>>
>>>
>>>
&
t allows you to read
>> data from a db directly so you dont need to go via spk streaming?
>>
>>
>> hth
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 7, 2016 at 3:09 PM, Ajay Chander <itsche...@gmail.com
>>
Hi Spark users,
Right now we are using spark for everything(loading the data from
sqlserver, apply transformations, save it as permanent tables in hive) in
our environment. Everything is being done in one spark application.
The only thing we do before we launch our spark application through
Hi Vikash,
These are my thoughts, read the input directory using wholeTextFiles()
which would give a paired RDD with key as file name and value as file
content. Then you can apply a map function to read each line and append key
to the content.
Thank you,
Aj
On Tuesday, May 31, 2016, Vikash
Hi Everyone, Any insights on this thread? Thank you.
On Friday, May 27, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Everyone,
>
>I have some data located on the EdgeNode. Right
> now, the process I follow to copy the data from Edgenode
Hi Everyone,
I have some data located on the EdgeNode. Right
now, the process I follow to copy the data from Edgenode to HDFS is through
a shellscript which resides on Edgenode. In Oozie I am using a SSH action
to execute the shell script on Edgenode which copies the
t; This way we can narrow down where the issue is ?
>
>
> Sent from my iPhone
>
> On May 23, 2016, at 5:26 PM, Ajay Chander <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml','itsche...@gmail.com');>> wrote:
>
> I downloaded the spark 1.5 untilities and exported
gards,
Aj
On Monday, May 23, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi Everyone,
>
> I am building a Java Spark application in eclipse IDE. From my application
> I want to use hiveContext to read tables from the remote Hive(Hadoop
> cluster). On my machine I hav
Hi Everyone,
I am building a Java Spark application in eclipse IDE. From my application
I want to use hiveContext to read tables from the remote Hive(Hadoop
cluster). On my machine I have exported $HADOOP_CONF_DIR =
{$HOME}/hadoop/conf/. This path has all the remote cluster conf details
like
Never mind! I figured it out by saving it as hadoopfile and passing the
codec to it. Thank you!
On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> Hi, I have a folder temp1 in hdfs which have multiple format files
> test1.txt, test2.avsc (Avro file) in it
it. Is there any possible/effiencient way to achieve this?
Thanks,
Aj
On Tuesday, May 10, 2016, Ajay Chander <itsche...@gmail.com> wrote:
> I will try that out. Thank you!
>
> On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com
> <javascript:_e(%7B%7D,'cvml
Hi Deepak,
Thanks for your response. If I am correct, you suggest reading all
of those files into an rdd on the cluster using wholeTextFiles then apply
compression codec on it, save the rdd to another Hadoop cluster?
Thank you,
Ajay
On Tuesday, May 10, 2016, Deepak Sharma <deepa
I will try that out. Thank you!
On Tuesday, May 10, 2016, Deepak Sharma <deepakmc...@gmail.com> wrote:
> Yes that's what I intended to say.
>
> Thanks
> Deepak
> On 10 May 2016 11:47 pm, "Ajay Chander" <itsche...@gmail.com
> <javascript:_e(%7B%7D,'cvml',
Hi Everyone,
we are planning to migrate the data between 2 clusters and I see distcp
doesn't support data compression. Is there any efficient way to compress
the data during the migration ? Can I implement any spark job to do this ?
Thanks.
Mich,
Can you try the value for paymentdata to this
format paymentdata='2015-01-01 23:59:59' , to_date(paymentdate) and see if
it helps.
On Thursday, March 24, 2016, Tamas Szuromi
wrote:
> Hi Mich,
>
> Take a look
>
Hi Everyone, a quick question with in this context. What is the underneath
persistent storage that you guys are using? With regards to this
containerized environment? Thanks
On Thursday, March 10, 2016, yanlin wang wrote:
> How you guys make driver docker within container to be
Hi Ashok,
Try using hivecontext instead of sqlcontext. I suspect sqlcontext doesnot
have that functionality. Let me know if it works.
Thanks,
Ajay
On Friday, March 4, 2016, ashokkumar rajendran <
ashokkumar.rajend...@gmail.com> wrote:
> Hi Ayan,
>
> Thanks for the response.
Hi Sparklers,
Can you guys give an elaborate documentation of Spark UI as there are many
fields in it and we do not know much about it.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-UI-documentaton-needed-tp26300.html
Sent from the Apache Spark
Hi All,
I am running 3 executors in my spark streaming application with 3
cores per executors. I have written my custom receiver for receiving network
data.
In my current configuration I am launching 3 receivers , one receiver per
executor.
In the run if 2 of my executor dies, I am left
Hi Walrus,
Try caching the results just before calling the rdd.count.
Regards,
Ajay
> On Nov 13, 2015, at 7:56 PM, Walrus theCat <walrusthe...@gmail.com> wrote:
>
> Hi,
>
> I have an RDD which crashes the driver when being collected. I want to send
> the data on
that I
can get it upgraded through Ambari UI ? If possible can anyone point me to
a documentation online? Thank you.
Regards,
Ajay
On Wednesday, October 21, 2015, Saisai Shao <sai.sai.s...@gmail.com> wrote:
> Hi Frans,
>
> You could download Spark 1.5.1-hadoop 2.6 pre-built t
Hi Everyone,
I have a use case where I have to create a DataFrame inside the map()
function. To create a DataFrame it need sqlContext or hiveContext. Now how
do I pass the context to my map function ? And I am doing it in java. I
tried creating a class "TestClass" which implements "Function
toryServer.scala:231)
at
org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
I went to the lib folder and noticed that
"spark-assembly-1.5.1-hadoop2.6.0.jar" is missing that class. I was able to
get the spark history server started with 1.3.1 but not 1.5.1. Any inputs
on this?
Really appreciate your help. Thanks
Hi Everyone,
Any one has any idea if spark-1.5.1 is available as a service on
HortonWorks ? I have spark-1.3.1 installed on the Cluster and it is a
HortonWorks distribution. Now I want upgrade it to spark-1.5.1. Anyone here
have any idea about it? Thank you in advance.
Regards,
Ajay
. Why
don't you see hdfs logs and see what's happening when your application is
talking to namenode? I suspect some networking issue or check if the
datanodes are running fine.
Thank you,
Ajay
On Saturday, October 3, 2015, Jacinto Arias <ja...@elrocin.es> wrote:
> Yes printing t
this helps!
Ajay
On Mon, Sep 14, 2015 at 1:21 PM, Ankur Srivastava <
ankur.srivast...@gmail.com> wrote:
> Hi Rachana
>
> I didn't get you r question fully but as the error says you can not
> perform a rdd transformation or action inside another transformation. In
> your examp
Hi David,
Thanks for responding! My main intention was to submit spark Job/jar to
yarn cluster from my eclipse with in the code. Is there any way that I
could pass my yarn configuration somewhere in the code to submit the jar to
the cluster?
Thank you,
Ajay
On Sunday, August 30, 2015, David
assumption is wrong or if I am missing
anything here.
I have attached the word count program that I was using. Any help is highly
appreciated.
Thank you,
Ajay
submit_spark_job
Description: Binary data
-
To unsubscribe, e-mail: user
javascript:_e(%7B%7D,'cvml','wysakowicz.da...@gmail.com'); wrote:
Hi Ajay,
In short story: No, there is no easy way to do that. But if you'd like to
play around this topic a good starting point would be this blog post from
sequenceIQ: blog
http://blog.sequenceiq.com/blog/2014/08/22/spark-submit
Hi Tim,
An option like spark.mesos.executor.max to cap the number of executors per
node/application would be very useful. However, having an option like
spark.mesos.executor.num
to specify desirable number of executors per node would provide even/much
better control.
Thanks,
Ajay
On Wed, Aug
specify desirable number of executors. If not
available, Mesos (in a simple implementation) can provide/offer whatever is
available. In a slightly complex implementation, we can build a simple
protocol to negotiate.
Regards,
Ajay
On Wed, Aug 12, 2015 at 5:51 PM, Tim Chen t...@mesosphere.io wrote
, and if needed, I will open a JIRA item.
I hope it helps.
Regards,
Ajay
On Mon, Aug 3, 2015 at 1:16 PM, Sujit Pal sujitatgt...@gmail.com wrote:
@Silvio: the mapPartitions instantiates a HttpSolrServer, then for each
query string in the partition, sends the query to Solr using SolrJ
this helps.
Ajay
On Thu, Jul 23, 2015 at 6:40 AM, Chintan Bhatt
chintanbhatt...@charusat.ac.in wrote:
Hi.
I'm facing following error while running .ova file containing Hortonworks
with Spark in Oracle VM Virtual Box:
Failed to open a session for the virtual machine *Hortonworks Sandbox
Hi Joji,
To my knowledge, Spark does not offer any such function.
I agree, defining a function to find an open (random) port would be a good
option. However, in order to invoke the corresponding SparkUI one needs
to know this port number.
Thanks,
Ajay
On Fri, Jul 24, 2015 at 10:19 AM, Joji
...
Thanks,
Ajay
On Fri, Jul 24, 2015 at 6:21 AM, Joji John jj...@ebates.com wrote:
*HI,*
*I am getting this error for some of spark applications. I have multiple
spark applications running in parallel. Is there a limit in the number of
spark applications that I can run in parallel
.
*Ajay Dubey*
Hi there!
It seems like you have Read/Execute access permission (and no
update/insert/delete access). What operation are you performing?
Ajay
On Jun 17, 2015, at 5:24 PM, nitinkak001 nitinkak...@gmail.com wrote:
I am trying to run a hive query from Spark code using HiveContext object
/tips/best-practices in
this regard?
Cheers!
Ajay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Instantiating-starting-Spark-jobs-programmatically-tp22577.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Thanks RK. I can turn on speculative execution but I am trying to find out
actual reason for delay as it happens on any node. Any idea about the stack
trace in my previous mail.
Regards,Ajay
On Thursday, January 15, 2015 8:02 PM, RK prk...@yahoo.com.INVALID wrote:
If you don't want
Thanks Nicos.GC does not contribute much to the execution time of the task. I
will debug it further today.
Regards,Ajay
On Thursday, January 15, 2015 11:55 PM, Nicos n...@hotmail.com wrote:
Ajay, Unless we are dealing with some synchronization/conditional variable bug
in Spark, try
)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.rdd.FilteredRDD.compute(FilteredRDD.scala:34)
Any inputs/suggestions to improve job time will be appreciated.
Regards,Ajay
Setting spark.sql.hive.convertMetastoreParquet to true has fixed this.
Regards,Ajay
On Tuesday, January 13, 2015 11:50 AM, Ajay Srivastava
a_k_srivast...@yahoo.com.INVALID wrote:
Hi,I am trying to read a parquet file using -val parquetFile =
sqlContext.parquetFile(people.parquet
is reading all the columns from disk in case of table1 when it needs only
3 columns.
How should I make sure that it reads only 3 of 10 columns from disk ?
Regards,
Ajay
Hi,
Can we use Storm Streaming as RDD in Spark? Or any way to get Spark work
with Storm?
Thanks
Ajay
Hi,
The question is to do streaming in Spark with Storm (not using Spark
Streaming).
The idea is to use Spark as a in-memory computation engine and static data
coming from Cassandra/Hbase and streaming data from Storm.
Thanks
Ajay
On Tue, Dec 23, 2014 at 2:03 PM, Gerard Maas gerard.m
Right. I contacted the SummingBird users as well. It doesn't support Spark
streaming currently.
We are heading towards Storm as it is mostly widely used. Is Spark
streaming production ready?
Thanks
Ajay
On Tue, Dec 23, 2014 at 3:47 PM, Gerard Maas gerard.m...@gmail.com wrote:
I'm not aware
) It takes around .6 second using Spark (either SELECT * FROM users WHERE
name='Anna' or javaFunctions(sc).cassandraTable(test, people,
mapRowTo(Person.class)).where(name=?, Anna);
Please let me know if I am missing something in Spark configuration or
Cassandra-Spark Driver.
Thanks
Ajay Garga
Hadoop, HBase?. We may use
Cassandra/MongoDb/CouchBase as well.
4) Is Spark supports RDBMS too?. We can have a single interface to pull out
data from multiple data sources?
5) Any recommendations(not limited to usage of Spark) for our specific
requirement described above.
Thanks
Ajay
Note : I have
Yes that is my understanding of how it should work.
But in my case when I call collect first time, it reads the data from files
on the disk.
Subsequent collect queries are not reading data files ( Verified from the
logs.)
On spark ui I see only shuffle read and no shuffle write.
--
View this
.
Since no data is cached in spark how is action on C is served without
reading data from disk.
Thanks
--Ajay
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Joined-RDD-tp18820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi,
I did not find any videos on apache spark channel in youtube yet.
Any idea when these will be made available ?
Regards,
Ajay
also explain the behavior of storage level - NONE ?
Regards,
Ajay
Hi All,
Is it possible to map and filter a javardd in a single operation?
Thanks
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks Mayur for clarification..
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Map-with-filter-on-JavaRdd-tp8401p8410.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks Matei. We have tested the fix and it's working perfectly.
Andrew, we set spark.shuffle.spill=false but the application goes out of
memory. I think that is expected.
Regards,Ajay
On Friday, June 6, 2014 3:49 AM, Andrew Ash and...@andrewash.com wrote:
Hi Ajay,
Can you please try
there
will not be any mismatch of jars. On two workers, since executor memory gets
doubled the code works fine.
Regards,
Ajay
On Thursday, June 5, 2014 1:35 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
If this isn’t the problem, it would be great if you can post the code for the
program.
Matei
and looks correct. But when single worker is used with two or more than two
cores, the result seems to be random. Every time, count of joined record is
different.
Does this sound like a defect or I need to take care of something while using
join ? I am using spark-0.9.1.
Regards
Ajay
84 matches
Mail list logo