RE: Spark application fail wit numRecords error

2017-11-01 Thread Serkan TAS
Hi, I checked the following threads but i am still not sure if it is misuse, common o a bug. https://stackoverflow.com/questions/34989539/spark-streaming-from-kafka-has-error-numrecords-must-not-be-negative

Re: Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread makoto
I'm not sure whether pyspark supports python 3.6 but pyspark and python 3.6 is working on my environment. I found the following issue and it seems to be already resolved. https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-19019 2017/11/02 午前11:54 "Jun Shi" :

RE: Dose pyspark supports python3.6?

2017-11-01 Thread van den Heever, Christian CC
Dear Spark users I have been asked to provide a presentation / business case as to why to use spark and java as ingestion tool for HDFS and HIVE And why to move away from an etl tool. Could you be so kind as to provide with some pros and cons to this. I have the following : Pros: In house

Fwd: Dose pyspark supports python3.6?

2017-11-01 Thread Jun Shi
Dear spark developers: It’s so exciting to send this email to you. I have encountered the problem that if pyspark supports python3.6? (I found some answer online is no.) Can you tell me the answer which python versions does pyspark support? I’m looking forward for your

Re: Writing custom Structured Streaming receiver

2017-11-01 Thread Tathagata Das
Structured Streaming source APIs are not yet public, so there isnt a guide. However, if you are adventurous enough, you can take a look at the source code in Spark. Source API: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala

Writing custom Structured Streaming receiver

2017-11-01 Thread Daniel Haviv
Hi, Is there a guide to writing a custom Structured Streaming receiver? Thank you. Daniel

Re: Spark application fail wit numRecords error

2017-11-01 Thread Prem Sure
Hi, any offset left over for new topic consumption?, case can be the offset is beyond current latest offset and cuasing negative. hoping kafka brokers health is good and are up, this can also be a reason sometimes. On Wed, Nov 1, 2017 at 11:40 AM, Serkan TAS wrote: >

Announcing Spark on Kubernetes release 0.5.0

2017-11-01 Thread Yinan Li
The Spark on Kubernetes development community is pleased to announce release 0.5.0 of Apache Spark with Kubernetes as a native scheduler back-end! This release includes a few bug fixes and the following features: - Spark R support - Kubernetes 1.8 support - Mounts emptyDir volumes for

Re: Read parquet files as buckets

2017-11-01 Thread Michael Artz
Hi, What about the DAG can you send that as well? From the resulting "write" call? On Wed, Nov 1, 2017 at 5:44 AM, אורן שמון wrote: > The version is 2.2.0 . > The code for the write is : > sortedApiRequestLogsDataSet.write > .bucketBy(numberOfBuckets, "userId")

Logistic regression in Spark TestCase

2017-11-01 Thread cjn
Hi, Spark run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.(Please see the graph below) What kind of test dataset and cluster configuration can get the test results above, has anyone known? And,Where can i get the test dataset? Thanx in advance. Best

Spark application fail wit numRecords error

2017-11-01 Thread Serkan TAS
Hi, I searched the error in kafka but i think at last, it is related with spark not kafka. Has anyone faced to an exception that is terminating program with error "numRecords must not be negative" while streaming ? Thanx in advance. Regards. Bu ileti