from:"Mail.com"

Re: Num of executors and cores

2016-07-26 Thread Mail.com

Hi,

In spark submit, I specify --master yarn-client.
When I go to executors in UI I do see all the 12 different executors assigned. 
But for the stage when I drill down to Tasks I saw only 8 tasks with index 0-7.

I ran again increasing the number of executors as 15 and I now see 12 tasks for 
the stage.

Still like to understand even if 12 executors were available why there was only 
8 tasks for the stage. 

Thanks,
Pradeep



> On Jul 26, 2016, at 8:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:
> 
> Hi,
> 
> Where's this yarn-client mode specified? When you said "However, when
> I run the job I see that the stage which reads the directory has only
> 8 tasks." -- how do you see 8 tasks for a stage? It appears you're in
> local[*] mode on a 8-core machine (like me) and that's why I'm asking
> such basic questions.
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
>> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> More of jars and files and app name. It runs on yarn-client mode.
>> 
>> Thanks,
>> Pradeep
>> 
>>> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>> 
>>> Hi,
>>> 
>>> What's ""? What master URL do you use?
>>> 
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> 
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>> 
>>> 
>>>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.mi...@mail.com> wrote:
>>>> Hi All,
>>>> 
>>>> I have a directory which has 12 files. I want to read the entire file so I 
>>>> am reading it as wholeTextFiles(dirpath, numPartitions).
>>>> 
>>>> I run spark-submit as  --num-executors 12 
>>>> --executor-cores 1 and numPartitions 12.
>>>> 
>>>> However, when I run the job I see that the stage which reads the directory 
>>>> has only 8 tasks. So some task reads more than one file and takes twice 
>>>> the time.
>>>> 
>>>> What can I do that the files are read by 12 tasks  I.e one file per task.
>>>> 
>>>> Thanks,
>>>> Pradeep
>>>> 
>>>> -
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Num of executors and cores

2016-07-26 Thread Mail.com

More of jars and files and app name. It runs on yarn-client mode.

Thanks,
Pradeep

> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote:
> 
> Hi,
> 
> What's ""? What master URL do you use?
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
> 
> 
>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pradeep.mi...@mail.com> wrote:
>> Hi All,
>> 
>> I have a directory which has 12 files. I want to read the entire file so I 
>> am reading it as wholeTextFiles(dirpath, numPartitions).
>> 
>> I run spark-submit as  --num-executors 12 --executor-cores 
>> 1 and numPartitions 12.
>> 
>> However, when I run the job I see that the stage which reads the directory 
>> has only 8 tasks. So some task reads more than one file and takes twice the 
>> time.
>> 
>> What can I do that the files are read by 12 tasks  I.e one file per task.
>> 
>> Thanks,
>> Pradeep
>> 
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: spark context stop vs close

2016-07-25 Thread Mail.com

Okay. Yes it is JavaSparkContext. Thanks.

> On Jul 24, 2016, at 1:31 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> I think this is about JavaSparkContext which implements the standard
> Closeable interface for convenience. Both do exactly the same thing.
> 
>> On Sun, Jul 24, 2016 at 6:27 PM, Jacek Laskowski <ja...@japila.pl> wrote:
>> Hi,
>> 
>> I can only find stop. Where did you find close?
>> 
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>> 
>> 
>>> On Sat, Jul 23, 2016 at 3:11 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>>> Hi All,
>>> 
>>> Where should we us spark context stop vs close. Should we stop the context 
>>> first and then close.
>>> 
>>> Are general guidelines around this. When I stop and later try to close I 
>>> get RPC already closed error.
>>> 
>>> Thanks,
>>> Pradeep
>>> 
>>> 
>>> 
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> 
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Num of executors and cores

2016-07-25 Thread Mail.com

Hi All,

I have a directory which has 12 files. I want to read the entire file so I am 
reading it as wholeTextFiles(dirpath, numPartitions).

I run spark-submit as  --num-executors 12 --executor-cores 1 
and numPartitions 12.

However, when I run the job I see that the stage which reads the directory has 
only 8 tasks. So some task reads more than one file and takes twice the time.

What can I do that the files are read by 12 tasks  I.e one file per task.

Thanks,
Pradeep

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

spark context stop vs close

2016-07-23 Thread Mail.com

Hi All,

Where should we us spark context stop vs close. Should we stop the context 
first and then close.

Are general guidelines around this. When I stop and later try to close I get 
RPC already closed error.

Thanks,
Pradeep




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to connect HBase and Spark using Python?

2016-07-22 Thread Mail.com

Hbase Spark module will be available with Hbase 2.0. Is that out yet?

> On Jul 22, 2016, at 8:50 PM, Def_Os  wrote:
> 
> So it appears it should be possible to use HBase's new hbase-spark module, if
> you follow this pattern:
> https://hbase.apache.org/book.html#_sparksql_dataframes
> 
> Unfortunately, when I run my example from PySpark, I get the following
> exception:
> 
> 
>> py4j.protocol.Py4JJavaError: An error occurred while calling o120.save.
>> : java.lang.RuntimeException: org.apache.hadoop.hbase.spark.DefaultSource
>> does not allow create table as select.
>>at scala.sys.package$.error(package.scala:27)
>>at
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:259)
>>at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>at java.lang.reflect.Method.invoke(Method.java:606)
>>at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>>at py4j.Gateway.invoke(Gateway.java:259)
>>at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>>at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>at py4j.GatewayConnection.run(GatewayConnection.java:209)
>>at java.lang.Thread.run(Thread.java:745)
> 
> Even when I created the table in HBase first, it still failed.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-connect-HBase-and-Spark-using-Python-tp27372p27397.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming - Direct Approach

2016-07-11 Thread Mail.com

Hi All,

Can someone please confirm if streaming direct approach for reading Kafka is 
still experimental or can it be used for production use.

I see the documentation and talk from TD suggesting the advantages of the 
approach but docs state it is an "experimental" feature. 

Please suggest

Thanks,
Pradeep

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Running streaming applications in Production environment

2016-06-14 Thread Mail.com

Hi All,

Can you please advise best practices to running streaming jobs in Production 
that reads from Kafka.

How do we trigger them - through a start script and best ways to monitor the 
application is running and send alert when down etc.

Thanks,
Pradeep






-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Kafka connection logs in Spark

2016-05-26 Thread Mail.com

Hi Cody,

I used Horton Works jars for spark streaming that would enable get messages  
from Kafka with kerberos.

Thanks,
Pradeep


> On May 26, 2016, at 11:04 AM, Cody Koeninger <c...@koeninger.org> wrote:
> 
> I wouldn't expect kerberos to work with anything earlier than the beta
> consumer for kafka 0.10
> 
>> On Wed, May 25, 2016 at 9:41 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> Hi All,
>> 
>> I am connecting Spark 1.6 streaming  to Kafka 0.8.2 with Kerberos. I ran 
>> spark streaming in debug mode, but do not see any log saying it connected to 
>> Kafka or  topic etc. How could I enable that.
>> 
>> My spark streaming job runs but no messages are fetched from the RDD. Please 
>> suggest.
>> 
>> Thanks,
>> Pradeep
>> 
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Kafka connection logs in Spark

2016-05-25 Thread Mail.com

Hi All,

I am connecting Spark 1.6 streaming  to Kafka 0.8.2 with Kerberos. I ran spark 
streaming in debug mode, but do not see any log saying it connected to Kafka or 
 topic etc. How could I enable that. 

My spark streaming job runs but no messages are fetched from the RDD. Please 
suggest.

Thanks,
Pradeep

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com

Yes.

Sent from my iPhone

> On May 20, 2016, at 10:11 AM, Sahil Sareen <sareen...@gmail.com> wrote:
> 
> I'm not sure if this happens on small files or big ones as I have a mix of 
> them always.
> Did you see this only for big files?
> 
>> On Fri, May 20, 2016 at 7:36 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> Hi Sahil,
>> 
>> I have seen this with high GC time. Do you ever get this error with small 
>> volume files
>> 
>> Pradeep
>> 
>>> On May 20, 2016, at 9:32 AM, Sahil Sareen <sareen...@gmail.com> wrote:
>>> 
>>> Hey all
>>> 
>>> I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my 
>>> application performance due to these errors.
>>> Can someone please let out all the possible problems that could cause this?
>>> 
>>> 
>>> Full log:
>>> 
>>> 16/05/19 02:17:54 ERROR ContextCleaner: Error cleaning broadcast 266685
>>> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
>>> seconds]. This timeout is controlled by spark.rpc.askTimeout
>>> at 
>>> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
>>> at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
>>> at 
>>> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
>>> at 
>>> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
>>> at 
>>> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136)
>>> at 
>>> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
>>> at 
>>> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
>>> at 
>>> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
>>> at 
>>> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
>>> at 
>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
>>> at 
>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
>>> at scala.Option.foreach(Option.scala:257)
>>> at 
>>> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
>>> at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
>>> at 
>>> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
>>> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
>>> [120 seconds]
>>> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
>>> at 
>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>> at scala.concurrent.Await$.result(package.scala:190)
>>> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:241)
>>> ... 12 more
>>> 16/05/19 02:18:26 ERROR TaskSchedulerImpl: Lost executor 
>>> 20160421-192532-1677787146-5050-40596-S23 on ip-10-0-1-70.ec2.internal: 
>>> Executor heartbeat timed out after 161447 ms
>>> 16/05/19 02:18:53 ERROR TaskSchedulerImpl: Lost executor 
>>> 20160421-192532-1677787146-5050-40596-S23 on ip-10-0-1-70.ec2.internal: 
>>> remote Rpc client disassociated
>>> 
>>> Thanks
>>> Sahil
>

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

2016-05-20 Thread Mail.com

Hi Sahil,

I have seen this with high GC time. Do you ever get this error with small 
volume files

Pradeep

> On May 20, 2016, at 9:32 AM, Sahil Sareen  wrote:
> 
> Hey all
> 
> I'm using Spark-1.6.1 and occasionally seeing executors lost and hurting my 
> application performance due to these errors.
> Can someone please let out all the possible problems that could cause this?
> 
> 
> Full log:
> 
> 16/05/19 02:17:54 ERROR ContextCleaner: Error cleaning broadcast 266685
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [120 
> seconds]. This timeout is controlled by spark.rpc.askTimeout
> at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcEnv.scala:214)
> at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:229)
> at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcEnv.scala:225)
> at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:242)
> at 
> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:136)
> at 
> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:228)
> at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
> at 
> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:67)
> at 
> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:214)
> at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:170)
> at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$2.apply(ContextCleaner.scala:161)
> at scala.Option.foreach(Option.scala:257)
> at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:161)
> at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1136)
> at 
> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:154)
> at org.apache.spark.ContextCleaner$$anon$3.run(ContextCleaner.scala:67)
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
> [120 seconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:190)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:241)
> ... 12 more
> 16/05/19 02:18:26 ERROR TaskSchedulerImpl: Lost executor 
> 20160421-192532-1677787146-5050-40596-S23 on ip-10-0-1-70.ec2.internal: 
> Executor heartbeat timed out after 161447 ms
> 16/05/19 02:18:53 ERROR TaskSchedulerImpl: Lost executor 
> 20160421-192532-1677787146-5050-40596-S23 on ip-10-0-1-70.ec2.internal: 
> remote Rpc client disassociated
> 
> Thanks
> Sahil

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-19 Thread Mail.com

Hi Muthu,

Do you have Kerberos enabled?

Thanks,
Pradeep

> On May 19, 2016, at 12:17 AM, Ramaswamy, Muthuraman 
> <muthuraman.ramasw...@viasat.com> wrote:
> 
> I am using Spark 1.6.1 and Kafka 0.9+ It works for both receiver and 
> receiver-less mode.
> 
> One thing I noticed when you specify invalid topic name, KafkaUtils doesn't 
> fetch any messages. So, check you have specified the topic name correctly.
> 
> ~Muthu
> ____
> From: Mail.com [pradeep.mi...@mail.com]
> Sent: Monday, May 16, 2016 9:33 PM
> To: Ramaswamy, Muthuraman
> Cc: Cody Koeninger; spark users
> Subject: Re: KafkaUtils.createDirectStream Not Fetching Messages with 
> Confluent Serializers as Value Decoder.
> 
> Hi Muthu,
> 
> Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for 
> simple string messages.
> 
> Console producer and consumer work fine. But spark always reruns empty RDD. I 
> am using Receiver based Approach.
> 
> Thanks,
> Pradeep
> 
>> On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman 
>> <muthuraman.ramasw...@viasat.com> wrote:
>> 
>> Yes, I can see the messages. Also, I wrote a quick custom decoder for avro 
>> and it works fine for the following:
>> 
>>>> kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": 
>>>> brokers}, valueDecoder=decoder)
>> 
>> But, when I use the Confluent Serializers to leverage the Schema Registry 
>> (based on the link shown below), it doesnâ€™t work for me. I am not sure 
>> whether I need to configure any more details to consume the Schema Registry. 
>> I can fetch the schema from the schema registry based on is Ids. The decoder 
>> method is not returning any values for me.
>> 
>> ~Muthu
>> 
>> 
>> 
>>> On 5/16/16, 10:49 AM, "Cody Koeninger" <c...@koeninger.org> wrote:
>>> 
>>> Have you checked to make sure you can receive messages just using a
>>> byte array for value?
>>> 
>>> On Mon, May 16, 2016 at 12:33 PM, Ramaswamy, Muthuraman
>>> <muthuraman.ramasw...@viasat.com> wrote:
>>>> I am trying to consume AVRO formatted message through
>>>> KafkaUtils.createDirectStream. I followed the listed below example (refer
>>>> link) but the messages are not being fetched by the Stream.
>>>> 
>>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_30339636_spark-2Dpython-2Davro-2Dkafka-2Ddeserialiser=CwIBaQ=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk=NQ-dw5X8CJcqaXIvIdMUUdkL0fHDonD07FZzTY3CgiU=Nc-rPMFydyCrwOZuNWs2GmSL4NkN8eGoR-mkJUlkCx0=hwqxCKl3P4_9pKWeo1OGR134QegMRe3Xh22_WMy-5q8=
>>>> 
>>>> Is there any code missing that I must add to make the above sample work.
>>>> Say, I am not sure how the confluent serializers would know the avro schema
>>>> info as it knows only the Schema Registry URL info.
>>>> 
>>>> Appreciate your help.
>>>> 
>>>> ~Muthu
>> ?B‹CB•?È?[œÝXœØÜšX™K??K[XZ[?ˆ?\Ù\‹][œÝXœØÜšX™P?Ü?\šË˜\?XÚ?K›Ü™ÃB‘›Üˆ?Y??]?[Û˜[??ÛÛ[X[™?Ë??K[XZ[?ˆ?\Ù\‹Z?[???Ü?\šË˜\?XÚ?K›Ü™ÃBƒB
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Filter out the elements from xml file in Spark

2016-05-19 Thread Mail.com

Hi Yogesh,

Can you try map operation and get what you need. Whatever parser you are using. 
You could also look at spark-XML package . 

Thanks,
Pradeep
> On May 19, 2016, at 4:39 AM, Yogesh Vyas  wrote:
> 
> Hi,
> I had xml files which I am reading through textFileStream, and then
> filtering out the required elements using traditional conditions and
> loops. I would like to know if  there is any specific packages or
> functions provided in spark to perform operations on RDD of xml?
> 
> Regards,
> Yogesh
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-18 Thread Mail.com

Adding back users.



> On May 18, 2016, at 11:49 AM, Mail.com <pradeep.mi...@mail.com> wrote:
> 
> Hi Uladzimir,
> 
> I run is as below.
> 
> Spark-submit --class com.test --num-executors 4 --executor-cores 5 --queue 
> Dev --master yarn-client --driver-memory 512M --executor-memory 512M test.jar
> 
> Thanks,
> Pradeep
> 
> 
>> On May 18, 2016, at 5:45 AM, Vova Shelgunov <vvs...@gmail.com> wrote:
>> 
>> Hi Pradeep,
>> 
>> How do you run your spark application? What is spark master? How many cores 
>> do you allocate?
>> 
>> Regards,
>> Uladzimir
>> 
>>> On May 17, 2016 7:33 AM, "Mail.com" <pradeep.mi...@mail.com> wrote:
>>> Hi Muthu,
>>> 
>>> Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for 
>>> simple string messages.
>>> 
>>> Console producer and consumer work fine. But spark always reruns empty RDD. 
>>> I am using Receiver based Approach.
>>> 
>>> Thanks,
>>> Pradeep
>>> 
>>> > On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman 
>>> > <muthuraman.ramasw...@viasat.com> wrote:
>>> >
>>> > Yes, I can see the messages. Also, I wrote a quick custom decoder for 
>>> > avro and it works fine for the following:
>>> >
>>> >>> kvs = KafkaUtils.createDirectStream(ssc, [topic], 
>>> >>> {"metadata.broker.list": brokers}, valueDecoder=decoder)
>>> >
>>> > But, when I use the Confluent Serializers to leverage the Schema Registry 
>>> > (based on the link shown below), it doesnâ€™t work for me. I am not sure 
>>> > whether I need to configure any more details to consume the Schema 
>>> > Registry. I can fetch the schema from the schema registry based on is 
>>> > Ids. The decoder method is not returning any values for me.
>>> >
>>> > ~Muthu
>>> >
>>> >
>>> >
>>> >> On 5/16/16, 10:49 AM, "Cody Koeninger" <c...@koeninger.org> wrote:
>>> >>
>>> >> Have you checked to make sure you can receive messages just using a
>>> >> byte array for value?
>>> >>
>>> >> On Mon, May 16, 2016 at 12:33 PM, Ramaswamy, Muthuraman
>>> >> <muthuraman.ramasw...@viasat.com> wrote:
>>> >>> I am trying to consume AVRO formatted message through
>>> >>> KafkaUtils.createDirectStream. I followed the listed below example 
>>> >>> (refer
>>> >>> link) but the messages are not being fetched by the Stream.
>>> >>>
>>> >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_30339636_spark-2Dpython-2Davro-2Dkafka-2Ddeserialiser=CwIBaQ=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk=NQ-dw5X8CJcqaXIvIdMUUdkL0fHDonD07FZzTY3CgiU=Nc-rPMFydyCrwOZuNWs2GmSL4NkN8eGoR-mkJUlkCx0=hwqxCKl3P4_9pKWeo1OGR134QegMRe3Xh22_WMy-5q8=
>>> >>>
>>> >>> Is there any code missing that I must add to make the above sample work.
>>> >>> Say, I am not sure how the confluent serializers would know the avro 
>>> >>> schema
>>> >>> info as it knows only the Schema Registry URL info.
>>> >>>
>>> >>> Appreciate your help.
>>> >>>
>>> >>> ~Muthu
>>> >  
>>> > B‹CB• 
>>> > È [œÝXœØÜšX™K  K[XZ[ ˆ \Ù\‹][œÝXœØÜšX™P Ü \šË˜\ XÚ K›Ü™ÃB‘›Üˆ Y  ] [Û˜[  
>>> > ÛÛ[X[™ Ë  K[XZ[ ˆ \Ù\‹Z [   Ü \šË˜\ XÚ K›Ü™ÃBƒB
>>> 
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

2016-05-16 Thread Mail.com

Hi Muthu,

Are you on spark 1.4.1 and Kafka 0.8.2? I have a similar issue even for simple 
string messages.

Console producer and consumer work fine. But spark always reruns empty RDD. I 
am using Receiver based Approach. 

Thanks,
Pradeep

> On May 16, 2016, at 8:19 PM, Ramaswamy, Muthuraman 
>  wrote:
> 
> Yes, I can see the messages. Also, I wrote a quick custom decoder for avro 
> and it works fine for the following:
> 
>>> kvs = KafkaUtils.createDirectStream(ssc, [topic], {"metadata.broker.list": 
>>> brokers}, valueDecoder=decoder)
> 
> But, when I use the Confluent Serializers to leverage the Schema Registry 
> (based on the link shown below), it doesnâ€™t work for me. I am not sure 
> whether I need to configure any more details to consume the Schema Registry. 
> I can fetch the schema from the schema registry based on is Ids. The decoder 
> method is not returning any values for me.
> 
> ~Muthu
> 
> 
> 
>> On 5/16/16, 10:49 AM, "Cody Koeninger"  wrote:
>> 
>> Have you checked to make sure you can receive messages just using a
>> byte array for value?
>> 
>> On Mon, May 16, 2016 at 12:33 PM, Ramaswamy, Muthuraman
>>  wrote:
>>> I am trying to consume AVRO formatted message through
>>> KafkaUtils.createDirectStream. I followed the listed below example (refer
>>> link) but the messages are not being fetched by the Stream.
>>> 
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_30339636_spark-2Dpython-2Davro-2Dkafka-2Ddeserialiser=CwIBaQ=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk=NQ-dw5X8CJcqaXIvIdMUUdkL0fHDonD07FZzTY3CgiU=Nc-rPMFydyCrwOZuNWs2GmSL4NkN8eGoR-mkJUlkCx0=hwqxCKl3P4_9pKWeo1OGR134QegMRe3Xh22_WMy-5q8=
>>>  
>>> 
>>> Is there any code missing that I must add to make the above sample work.
>>> Say, I am not sure how the confluent serializers would know the avro schema
>>> info as it knows only the Schema Registry URL info.
>>> 
>>> Appreciate your help.
>>> 
>>> ~Muthu
> B‹CB•È[œÝXœØÜšX™KK[XZ[ˆ\Ù\‹][œÝXœØÜšX™PÜ\šË˜\XÚK›Ü™ÃB‘›ÜˆY][Û˜[ÛÛ[X[™ËK[XZ[ˆ\Ù\‹Z[Ü\šË˜\XÚK›Ü™ÃBƒB

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Executors and Cores

2016-05-15 Thread Mail.com

Hi Mich,

We have HDP 2.3.2 where spark will run on 21 nodes each having 250 gb memory.  
Jobs run in yarn-client and yarn-cluster mode.

We have other teams using the same cluster to build their applications.

Regards,
Pradeep


> On May 15, 2016, at 1:37 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Hi Pradeep,
> 
> In your case what type of cluster we are taking about? A standalone cluster?
> 
> HTh
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> http://talebzadehmich.wordpress.com
>  
> 
>> On 15 May 2016 at 13:19, Mail.com <pradeep.mi...@mail.com> wrote:
>> Hi ,
>> 
>> I have seen multiple videos on spark tuning which shows how to determine # 
>> cores, #executors and memory size of the job.
>> 
>> In all that I have seen, it seems each job has to be given the max resources 
>> allowed in the cluster.
>> 
>> How do we factor in input size as well? I am processing a 1gb compressed 
>> file then I can live with say 10 executors and not 21 etc..
>> 
>> Also do we consider other jobs in the cluster that could be running? I will 
>> use only 20 GB out of available 300 gb etc..
>> 
>> Thanks,
>> Pradeep
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>

Executors and Cores

2016-05-15 Thread Mail.com

Hi ,

I have seen multiple videos on spark tuning which shows how to determine # 
cores, #executors and memory size of the job.

In all that I have seen, it seems each job has to be given the max resources 
allowed in the cluster.

How do we factor in input size as well? I am processing a 1gb compressed file 
then I can live with say 10 executors and not 21 etc..

Also do we consider other jobs in the cluster that could be running? I will use 
only 20 GB out of available 300 gb etc..

Thanks,
Pradeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Spark 1.4.1 + Kafka 0.8.2 with Kerberos

2016-05-13 Thread Mail.com

Hi All,

I am trying to get spark 1.4.1 (Java) work with Kafka 0.8.2 in Kerberos enabled 
cluster. HDP 2.3.2

Is there any document I can refer to.

Thanks,
Pradeep 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: XML Processing using Spark SQL

2016-05-12 Thread Mail.com

Hi Arun,

Could you try using Stax or JaxB.

Thanks,
Pradeep

> On May 12, 2016, at 8:35 PM, Hyukjin Kwon  wrote:
> 
> Hi Arunkumar,
> 
> 
> I guess your records are self-closing ones.
> 
> There is an issue open here, https://github.com/databricks/spark-xml/issues/92
> 
> This is about XmlInputFormat.scala and it seems a bit tricky to handle the 
> case so I left open until now.
> 
> 
> Thanks!
> 
> 
> 2016-05-13 5:03 GMT+09:00 Arunkumar Chandrasekar :
>> Hello,
>> 
>> Greetings.
>> 
>> I'm trying to process a xml file exported from Health Kit application using 
>> Spark SQL for learning purpose. The sample record data is like the below:
>> 
>>  > sourceVersion="9.3" device="HKDevice: 0x7896, name:iPhone, 
>> manufacturer:Apple, model:iPhone, hardware:iPhone7,2, software:9.3" 
>> unit="count" creationDate="2016-04-23 19:31:33 +0530" startDate="2016-04-23 
>> 19:00:20 +0530" endDate="2016-04-23 19:01:41 +0530" value="31"/>
>> 
>>  > sourceVersion="9.3.1" device="HKDevice: 0x85746, name:iPhone, 
>> manufacturer:Apple, model:iPhone, hardware:iPhone7,2, software:9.3.1" 
>> unit="count" creationDate="2016-04-24 05:45:00 +0530" startDate="2016-04-24 
>> 05:25:04 +0530" endDate="2016-04-24 05:25:24 +0530" value="10"/>.
>> 
>> I want to have the column name of my table as the field value like type, 
>> sourceName, sourceVersion and the row entries as their respective values 
>> like HKQuantityTypeIdentifierStepCount, Vizhi, 9.3.1,..
>> 
>> I took a look at the Spark-XML, but didn't get any information in my case 
>> (my xml is not well formed with the tags). Is there any other option to 
>> convert the record that I have mentioned above into a schema format for 
>> playing with Spark SQL?
>> 
>> Thanks in Advance.
>> 
>> Thank You,
>> Arun Chandrasekar
>> chan.arunku...@gmail.com
>

Re: Spark-csv- partitionBy

2016-05-10 Thread Mail.com

Hi,

I don't want to reduce partitions. Should write files depending upon the column 
value.

Trying to understand how reducing partition size will make it work.

Regards,
Pradeep

> On May 9, 2016, at 6:42 PM, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
> 
> Hi,
> 
> its supported, try to use coalesce(1) (the spelling is wrong) and after that 
> do the partitions.
> 
> Regards,
> Gourav
> 
>> On Mon, May 9, 2016 at 7:12 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> Hi,
>> 
>> I have to write tab delimited file and need to have one directory for each 
>> unique value of a column.
>> 
>> I tried using spark-csv with partitionBy and seems it is not supported. Is 
>> there any other option available for doing this?
>> 
>> Regards,
>> Pradeep
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>

Spark-csv- partitionBy

2016-05-09 Thread Mail.com

Hi,

I have to write tab delimited file and need to have one directory for each 
unique value of a column.

I tried using spark-csv with partitionBy and seems it is not supported. Is 
there any other option available for doing this?

Regards,
Pradeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Error in spark-xml

2016-05-02 Thread Mail.com

Can you try once by creating your own schema file and using it to read the XML.

I had similar issue but got that resolved by custom schema and by specifying 
each attribute in that.

Pradeep


> On May 1, 2016, at 9:45 AM, Hyukjin Kwon  wrote:
> 
> To be more clear,
> 
> If you set the rowTag as "book", then it will produces an exception which is 
> an issue opened here, https://github.com/databricks/spark-xml/issues/92
> 
> Currently it does not support to parse a single element with only a value as 
> a row.
> 
> 
> If you set the rowTag as "bkval", then it should work. I tested the case 
> below to double check.
> 
> If it does not work as below, please open an issue with some information so 
> that I can reproduce.
> 
> 
> I tested the case above with the data below
> 
>   
> bk_113
> bk_114
>   
>   
> bk_114
> bk_116
>   
>   
> bk_115
> bk_116
>   
> 
> 
> 
> I tested this with the codes below
> 
> val path = "path-to-file"
> sqlContext.read
>   .format("xml")
>   .option("rowTag", "bkval")
>   .load(path)
>   .show()
> 
> Thanks!
> 
> 
> 2016-05-01 15:11 GMT+09:00 Hyukjin Kwon :
>> Hi Sourav,
>> 
>> I think it is an issue. XML will assume the element by the rowTag as object.
>> 
>>  Could you please open an issue in 
>> https://github.com/databricks/spark-xml/issues please?
>> 
>> Thanks!
>> 
>> 
>> 2016-05-01 5:08 GMT+09:00 Sourav Mazumder :
>>> Hi,
>>> 
>>> Looks like there is a problem in spark-xml if the xml has multiple 
>>> attributes with no child element.
>>> 
>>> For example say the xml has a nested object as below 
>>> 
>>> bk_113
>>> bk_114
>>>  
>>> 
>>> Now if I create a dataframe starting with rowtag bkval and then I do a 
>>> select on that data frame it gives following error.
>>> 
>>> 
>>> scala.MatchError: ENDDOCUMENT (of class 
>>> com.sun.xml.internal.stream.events.EndDocumentEvent) at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$.checkEndElement(StaxXmlParser.scala:94)
>>>  at  
>>> com.databricks.spark.xml.parsers.StaxXmlParser$.com$databricks$spark$xml$parsers$StaxXmlParser$$convertObject(StaxXmlParser.scala:295)
>>>  at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:58)
>>>  at 
>>> com.databricks.spark.xml.parsers.StaxXmlParser$$anonfun$parse$1$$anonfun$apply$4.apply(StaxXmlParser.scala:46)
>>>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at 
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at 
>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at 
>>> scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308) at 
>>> scala.collection.Iterator$class.foreach(Iterator.scala:727) at 
>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at 
>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at 
>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
>>> at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
>>> at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at 
>>> scala.collection.AbstractIterator.to(Iterator.scala:1157) at 
>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
>>> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at 
>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
>>> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at 
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>>>  at 
>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
>>>  at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>>  at 
>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
>>>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at 
>>> org.apache.spark.scheduler.Task.run(Task.scala:88) at 
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>  at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>  at java.lang.Thread.run(Thread.java:745)
>>> 
>>> However if there is only one row like below, it works fine.
>>> 
>>> 
>>> bk_113
>>> 
>>> 
>>> Any workaround ?
>>> 
>>> Regards,
>>> Sourav
>

Re: JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com

wholeTextFiles() works.  It is just that it does not provide the parallelism.

This is on Spark 1.4. HDP 2.3.2. Batch jobs.

Thanks

> On Apr 26, 2016, at 9:16 PM, Harjit Singh <harjit.si...@deciphernow.com> 
> wrote:
> 
> You will have to write your customReceiver to do that. I don’t think 
> wholeTextFile is designed for that.
> 
> - Harjit
>> On Apr 26, 2016, at 7:19 PM, Mail.com <pradeep.mi...@mail.com> wrote:
>> 
>> 
>> Hi All,
>> I am reading entire directory of gz XML files with wholeTextFiles. 
>> 
>> I understand as it is gz and with wholeTextFiles the individual files are 
>> not splittable but why the entire directory is read by one executor, single 
>> task. I have provided number of executors as number of files in that 
>> directory.
>> 
>> Is the only option here is to repartition after the xmls are read and parsed 
>> with JaxB.
>> 
>> Regards,
>> Pradeep
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
> 
> v/r,
> Harjit Singh
> Decipher Technology Studios
> email:harjit.sin...@deciphernow.com
> mobile: 303-870-0883
> website: deciphertechstudios.com <http://deciphertechstudios.com/>
> 
> GPG:
> keyserver: hkps://hkps.pool.sks-keyservers.net
> keyid: D814A2EF
> 
> 
> 
> 
>

JavaSparkContext.wholeTextFiles read directory

2016-04-26 Thread Mail.com


Hi All,
I am reading entire directory of gz XML files with wholeTextFiles. 

I understand as it is gz and with wholeTextFiles the individual files are not 
splittable but why the entire directory is read by one executor, single task. I 
have provided number of executors as number of files in that directory.

Is the only option here is to repartition after the xmls are read and parsed 
with JaxB.

Regards,
Pradeep
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Create tab separated file from a dataframe spark 1.4 with Java

2016-04-21 Thread Mail.com

> Hi 

I have a dataframe and need to write to a tab separated file using spark 1.4 
and Java.
 
Can some one please suggest.

Thanks,
Pradeep

Re: spark on yarn

2016-04-20 Thread Mail.com

I get an error with a message that state what is max number of cores allowed.


> On Apr 20, 2016, at 11:21 AM, Shushant Arora  
> wrote:
> 
> I am running a spark application on yarn cluster.
> 
> say I have available vcors in cluster as 100.And I start spark application 
> with --num-executors 200 --num-cores 2 (so I need total 200*2=400 vcores) but 
> in my cluster only 100 are available.
> 
> What will happen ? Will the job abort or it will be submitted successfully 
> and 100 vcores will be aallocated to 50 executors and rest executors will be 
> started as soon as vcores are available ?
> 
> Please note dynamic allocation is not enabled in cluster. I have old version 
> 1.2.
> 
> Thanks
> 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Parse XML using java spark

2016-04-18 Thread Mail.com


You might look at using JaxB or Stax. If it is simple enough use data frames 
auto generated scheme.

Pradeep

> On Apr 18, 2016, at 6:37 PM, Jinan Alhajjaj  wrote:
> 
> Thank you for your help.
> I would like to parse the XML file using Java not scala . Can you please 
> provide me with exsample of how to parse XMl via java using spark. My XML 
> file is Wikipedia dump file 
> Thank you

Re: Num of executors and cores

Re: Num of executors and cores

Re: spark context stop vs close

Num of executors and cores

spark context stop vs close

Re: How to connect HBase and Spark using Python?

Spark Streaming - Direct Approach

Running streaming applications in Production environment

Re: Kafka connection logs in Spark

Kafka connection logs in Spark

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

Re: rpc.RpcTimeoutException: Futures timed out after [120 seconds]

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

Re: Filter out the elements from xml file in Spark

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

Re: KafkaUtils.createDirectStream Not Fetching Messages with Confluent Serializers as Value Decoder.

Re: Executors and Cores

Executors and Cores

Spark 1.4.1 + Kafka 0.8.2 with Kerberos

Re: XML Processing using Spark SQL

Re: Spark-csv- partitionBy

Spark-csv- partitionBy

Re: Error in spark-xml

Re: JavaSparkContext.wholeTextFiles read directory

JavaSparkContext.wholeTextFiles read directory

Create tab separated file from a dataframe spark 1.4 with Java

Re: spark on yarn

Re: Parse XML using java spark

28 matches

Site Navigation

Mail list logo

Footer information