Re: Num of executors and cores

2016-07-26 Thread Mail.com
am, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > >> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> Mor

Re: Num of executors and cores

2016-07-26 Thread Mail.com
; > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > >> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.mi...@mail.com> wrote: >> Hi All, >> >> I

Re: spark context stop vs close

2016-07-25 Thread Mail.com
http://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> >>> On Sat, Jul 23, 2016 at 3:11 PM, Mail.com <pradeep.mi...@mail.com> wrote: >>> Hi All, >>> >>> Where should we us spark context stop vs clos

Num of executors and cores

2016-07-25 Thread Mail.com
Hi All, I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions). I run spark-submit as --num-executors 12 --executor-cores 1 and numPartitions 12. However, when I run the job I see that the stage which reads the

spark context stop vs close

2016-07-23 Thread Mail.com
Hi All, Where should we us spark context stop vs close. Should we stop the context first and then close. Are general guidelines around this. When I stop and later try to close I get RPC already closed error. Thanks, Pradeep

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Mail.com
Hbase Spark module will be available with Hbase 2.0. Is that out yet? > On Jul 22, 2016, at 8:50 PM, Def_Os wrote: > > So it appears it should be possible to use HBase's new hbase-spark module, if > you follow this pattern: >

Spark Streaming - Direct Approach

2016-07-11 Thread Mail.com
Hi All, Can someone please confirm if streaming direct approach for reading Kafka is still experimental or can it be used for production use. I see the documentation and talk from TD suggesting the advantages of the approach but docs state it is an "experimental" feature. Please suggest

Running streaming applications in Production environment

2016-06-14 Thread Mail.com
Hi All, Can you please advise best practices to running streaming jobs in Production that reads from Kafka. How do we trigger them - through a start script and best ways to monitor the application is running and send alert when down etc. Thanks, Pradeep

Re: Kafka connection logs in Spark

2016-05-26 Thread Mail.com
beta > consumer for kafka 0.10 > >> On Wed, May 25, 2016 at 9:41 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> Hi All, >> >> I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran >> spark streaming in debug mode, but do not see any log sa

Kafka connection logs in Spark

2016-05-25 Thread Mail.com
Hi All, I am connecting Spark 1.6 streaming to Kafka 0.8.2 with Kerberos. I ran spark streaming in debug mode, but do not see any log saying it connected to Kafka or topic etc. How could I enable that. My spark streaming job runs but no messages are fetched from the RDD. Please suggest.

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com
Yes. Sent from my iPhone > On May 20, 2016, at 10:11 AM, Sahil Sareen <sareen...@gmail.com> wrote: > > I'm not sure if this happens on small files or big ones as I have a mix of > them always. > Did you see this only for big files? > >> On Fri, May 20, 2016 at

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com
Hi Sahil, I have seen this with high GC time. Do you ever get this error with small volume files Pradeep > On May 20, 2016, at 9:32 AM, Sahil Sareen wrote: > > Hey all > > I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my > application

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-19 Thread Mail.com
I noticed when you specify invalid topic name, KafkaUtils doesn't > fetch any messages. So, check you have specified the topic name correctly. > > ~Muthu > ____ > From: Mail.com [pradeep.mi...@mail.com] > Sent: Monday, May 16, 2016 9:33 PM >

Re: Filter out the elements from xml file in Spark

2016-05-19 Thread Mail.com
Hi Yogesh, Can you try map operation and get what you need. Whatever parser you are using. You could also look at spark-XML package . Thanks, Pradeep > On May 19, 2016, at 4:39 AM, Yogesh Vyas wrote: > > Hi, > I had xml files which I am reading through textFileStream,

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-18 Thread Mail.com
Adding back users. > On May 18, 2016, at 11:49 AM, Mail.com <pradeep.mi...@mail.com> wrote: > > Hi Uladzimir, > > I run is as below. > > Spark-submit --class com.test --num-executors 4 --executor-cores 5 --queue > Dev --master yarn-client --driver-memory 512M -

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-16 Thread Mail.com
Hi Muthu, Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for simple string messages. Console producer and consumer work fine. But spark always reruns empty RDD. I am using Receiver based Approach. Thanks, Pradeep > On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman >

Re: Executors and Cores

2016-05-15 Thread Mail.com
8Pw > > http://talebzadehmich.wordpress.com > > >> On 15 May 2016 at 13:19, Mail.com <pradeep.mi...@mail.com> wrote: >> Hi , >> >> I have seen multiple videos on spark tuning which shows how to determine # >> cores, #executors and memory size of

Executors and Cores

2016-05-15 Thread Mail.com
Hi , I have seen multiple videos on spark tuning which shows how to determine # cores, #executors and memory size of the job. In all that I have seen, it seems each job has to be given the max resources allowed in the cluster. How do we factor in input size as well? I am processing a 1gb

Spark 1.4.1 + Kafka 0.8.2 with Kerberos

2016-05-13 Thread Mail.com
Hi All, I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled cluster. HDP 2.3.2 Is there any document I can refer to. Thanks, Pradeep - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For

Re: XML Processing using Spark SQL

2016-05-12 Thread Mail.com
Hi Arun, Could you try using Stax or JaxB. Thanks, Pradeep > On May 12, 2016, at 8:35 PM, Hyukjin Kwon wrote: > > Hi Arunkumar, > > > I guess your records are self-closing ones. > > There is an issue open here, https://github.com/databricks/spark-xml/issues/92 > >

Re: Spark-csv- partitionBy

2016-05-10 Thread Mail.com
t; its supported, try to use coalesce(1) (the spelling is wrong) and after that > do the partitions. > > Regards, > Gourav > >> On Mon, May 9, 2016 at 7:12 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> Hi, >> >> I have to write tab delimited file an

Spark-csv- partitionBy

2016-05-09 Thread Mail.com
Hi, I have to write tab delimited file and need to have one directory for each unique value of a column. I tried using spark-csv with partitionBy and seems it is not supported. Is there any other option available for doing this? Regards, Pradeep

Re: Error in spark-xml

2016-05-02 Thread Mail.com
Can you try once by creating your own schema file and using it to read the XML. I had similar issue but got that resolved by custom schema and by specifying each attribute in that. Pradeep > On May 1, 2016, at 9:45 AM, Hyukjin Kwon wrote: > > To be more clear, > > If

Re: JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
hat. I don’t think > wholeTextFile is designed for that. > > - Harjit >> On Apr 26, 2016, at 7:19 PM, Mail.com <pradeep.mi...@mail.com> wrote: >> >> >> Hi All, >> I am reading entire directory of gz XML files with wholeTextFiles. >> >

JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com
Hi All, I am reading entire directory of gz XML files with wholeTextFiles. I understand as it is gz and with wholeTextFiles the individual files are not splittable but why the entire directory is read by one executor, single task. I have provided number of executors as number of files in that

Create tab separated file from a dataframe spark 1.4 with Java

2016-04-21 Thread Mail.com
> Hi I have a dataframe and need to write to a tab separated file using spark 1.4 and Java. Can some one please suggest. Thanks, Pradeep

Re: spark on yarn

2016-04-20 Thread Mail.com
I get an error with a message that state what is max number of cores allowed. > On Apr 20, 2016, at 11:21 AM, Shushant Arora > wrote: > > I am running a spark application on yarn cluster. > > say I have available vcors in cluster as 100.And I start spark

Re: Parse XML using java spark

2016-04-18 Thread Mail.com
You might look at using JaxB or Stax. If it is simple enough use data frames auto generated scheme. Pradeep > On Apr 18, 2016, at 6:37 PM, Jinan Alhajjaj wrote: > > Thank you for your help. > I would like to parse the XML file using Java not scala . Can you please