Split a row into multiple rows Java

2018-07-25 Thread nookala
I'm trying to generate multiple rows from a single row I have schema Name Id Date 0100 0200 0300 0400 and would like to make it into a vertical format with schema Name Id Date Time I have the code below and get the error Caused by: java.lang.RuntimeException:

Re: Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
Hey Holden, Thanks for your reply, We currently using a python function that produces a Row(TS=LongType(), bin=BinaryType()). We use this function like this dataframe.rdd.map(my_function).toDF().write.parquet() To reuse it in pandas_udf, we changes the return type to

Re: Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Holden Karau
Not currently. What's the problem with pandas_udf for your use case? On Wed, Jul 25, 2018 at 1:27 PM, Hichame El Khalfi wrote: > Hi There, > > > Is there a way to use Arrow format instead of Pickle but without using > pandas_udf ? > > > Thank for your help, > > > Hichame > -- Twitter:

Use Arrow instead of Pickle without pandas_udf

2018-07-25 Thread Hichame El Khalfi
Hi There, Is there a way to use Arrow format instead of Pickle but without using pandas_udf ? Thank for your help, Hichame

Backpressure initial rate not working

2018-07-25 Thread Biplob Biswas
I have enabled the spark.streaming.backpressure.enabled setting and also set spark.streaming.backpressure.initialRate to 15000, but my spark job is not respecting these settings when reading from Kafka after a failure. In my kafka topic around 500k records are waiting for being processed and

Re: [Spark Structured Streaming on K8S]: Debug - File handles/descriptor (unix pipe) leaking

2018-07-25 Thread Yuval.Itzchakov
We're experiencing the exact same issue while running load tests on Spark 2.3.1 with Structured Streaming and `mapGroupsWithState`. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Bug in Window Function

2018-07-25 Thread Jacek Laskowski
Hi Elior, Could you show the query that led to the exception? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Mastering Spark SQL https://bit.ly/mastering-spark-sql Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Kafka Streams

Bug in Window Function

2018-07-25 Thread Elior Malul
Exception in thread "main" org.apache.spark.sql.AnalysisException: collect_set(named_struct(value, country#123 AS value#346, count, (cast(count(country#123) windowspecdefinit ion(campaign_id#104, app_id#93, country#123, ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as double) /

***UNCHECKED*** UNSUBSCRIBE

2018-07-25 Thread sridhararao mutluri

How dose spark streaming program call python file

2018-07-25 Thread 康逸之
I am trying to build a real-time system with spark (written with scala), but here are some algorithm file written in python. How can i call the algorithm file ? Any idea how to let it work?

***UNCHECKED*** How dose spark streaming program (written with scala)call python file

2018-07-25 Thread 康逸之
I am trying to build a real-time system with spark (written with scala), but here are some algorithm file written in python. How can i call the algorithm file ? Any idea how to let it work?