[pyspark] java.lang.NoSuchMethodError: net.jpountz.util.Utils.checkRange error

2017-08-03 Thread kulas...@gmail.com
hi all i use spark-streaming 2.2.0 with python. and read data from kafka(2.11-0.10.0.0) cluster. folllow the kafka Integration guide http://spark.apache.org/docs/latest/streaming-kafka-0-8-integration.html. and i submit a python script with spark-submit --jars

Unsubscribe

2017-08-03 Thread Parijat Mazumdar
Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: PySpark Streaming S3 checkpointing

2017-08-03 Thread Riccardo Ferrari
Hi Steve, Thank you for your answer, much appreciated. Reading the code seems that: - Python StreamingContext.getOrCreate calls Scala StreamingContextPythonHelper().tryRecoverFromCheckpoint(

Re: SPARK Issue in Standalone cluster

2017-08-03 Thread Marco Mistroni
Hello my 2 cents here, hope it helps If you want to just to play around with Spark, i'd leave Hadoop out, it's an unnecessary dependency that you dont need for just running a python script Instead do the following: - got to the root of our master / slave node. create a directory /root/pyscripts -

SparkEventListener dropping events

2017-08-03 Thread Miles Crawford
We are seeing lots of stability problems with Spark 2.1.1 as a result of dropped events. We disabled the event log, which seemed to help, but many events are still being dropped, as in the example log below. I there any way for me to see what listener is backing up the queue? Is there any

Re: DataSet creation not working Spark 1.6.0 , populating wrong data CDH 5.7.1

2017-08-03 Thread Rabin Banerjee
Unfortunately I cant use scala/python. Any solution in Java ? On Thu, Aug 3, 2017 at 6:04 PM, Gourav Sengupta wrote: > Guru, > > Anyways you can pick up SCALA or Python. Makes things way easier. The > perfomance, maintainability, visibility, and minimum translation

Re: Reparitioning Hive tables - Container killed by YARN for exceeding memory limits

2017-08-03 Thread Chetan Khatri
Thanks Holden ! On Thu, Aug 3, 2017 at 4:02 AM, Holden Karau wrote: > The memory overhead is based less on the total amount of data and more on > what you end up doing with the data (e.g. if your doing a lot of off-heap > processing or using Python you need to increase

mapPartitioningWithIndex in Dataframe

2017-08-03 Thread Lalwani, Jayesh
Are there any plans to add mapPartitioningWithIndex in the Dataframe API? Or is there any way to implement my own mapPartitionWithIndex for a Dataframe? I am implementing something which is logically similar to the randomSplit function. In 2.1, randomSplit internally does

Re: DataSet creation not working Spark 1.6.0 , populating wrong data CDH 5.7.1

2017-08-03 Thread Abdallah Mahmoud
Unsubscribe me Please Sent with Mailtrack *Abdallah Mahmoud Zidan* *Junior Data Scientist & Software Developer* *(+20) 01027228807* On 3 August 2017 at 14:39, Jörn Franke

Re: DataSet creation not working Spark 1.6.0 , populating wrong data CDH 5.7.1

2017-08-03 Thread Jörn Franke
You need to create a schema for person. https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema > On 3. Aug 2017, at 12:09, Rabin Banerjee wrote: > > Hi All, > > I am trying to create a DataSet from DataFrame, where

Re: DataSet creation not working Spark 1.6.0 , populating wrong data CDH 5.7.1

2017-08-03 Thread Gourav Sengupta
Guru, Anyways you can pick up SCALA or Python. Makes things way easier. The perfomance, maintainability, visibility, and minimum translation loss makes things better. Regards, Gourav On Thu, Aug 3, 2017 at 11:09 AM, Rabin Banerjee < dev.rabin.baner...@gmail.com> wrote: > Hi All, > > I am

DataSet creation not working Spark 1.6.0 , populating wrong data CDH 5.7.1

2017-08-03 Thread Rabin Banerjee
Hi All, I am trying to create a DataSet from DataFrame, where dataframe has been created successfully, and using the same bean I am trying to create dataset. But when I am running it, Dataframe is created as expected. I am able to print the content as well. But not the dataset. The DataSet is

Re: Quick one on evaluation

2017-08-03 Thread Daniel Darabos
On Wed, Aug 2, 2017 at 2:16 PM, Jean Georges Perrin wrote: > Hi Sparkians, > > I understand the lazy evaluation mechanism with transformations and > actions. My question is simpler: 1) are show() and/or printSchema() > actions? I would assume so... > show() is an action (it prints

Re: SPARK Issue in Standalone cluster

2017-08-03 Thread Gourav Sengupta
Hi Steve, I love you mate, thanks a ton once again for ACTUALLY RESPONDING. I am now going through the documentation ( https://github.com/steveloughran/hadoop/blob/s3guard/HADOOP-13786-committer/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3a_committer_architecture.md) and it

Re: SPARK Issue in Standalone cluster

2017-08-03 Thread Steve Loughran
On 2 Aug 2017, at 20:05, Gourav Sengupta > wrote: Hi Steve, I have written a sincere note of apology to everyone in a separate email. I sincerely request your kind forgiveness before hand if anything does sound impolite in my