Re: Manually reading parquet files.

2019-03-21 Thread Ryan Blue
You're getting InternalRow instances. They probably have the data you want, but the toString representation doesn't match the data for InternalRow. On Thu, Mar 21, 2019 at 3:28 PM Long, Andrew wrote: > Hello Friends, > > > > I’m working on a performance improvement that reads additional parquet

Fwd: Cross Join

2019-03-21 Thread asma zgolli
-- Forwarded message - From: asma zgolli Date: jeu. 21 mars 2019 à 18:15 Subject: Cross Join To: Hello , I need to cross my data and i'm executing a cross join on two dataframes . C = A.crossJoin(B) A has 50 records B has 5 records the result im getting with spark 2.0 is a

Spark streaming error - Query terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,… 26 more fields != b#1291L

2019-03-21 Thread kineret M
I try to read a stream using my custom data source (v2, using spark 2.3), and it fails *in the second iteration* with the following exception while reading prune columns:Query [id=xxx, runId=yyy] terminated with exception: assertion failed: Invalid batch: a#660,b#661L,c#662,d#663,,... 26 more

How shall I configure the Spark executor memory size and the Alluxio worker memory size on a machine?

2019-03-21 Thread u9g
Hey, We have a cluster of 10 nodes each of which consists 128GB memory. We are about to running Spark and Alluxio on the cluster. We wonder how shall allocate the memory to the Spark executor and the Alluxio worker on a machine? Are there some recommendations? Thanks! Best, Andy Li

Re: [HELP WANTED] Apache Zipkin (incubating) needs Spark gurus

2019-03-21 Thread Reynold Xin
Are there specific questions you have? Might be easier to post them here also. On Wed, Mar 20, 2019 at 5:16 PM Andriy Redko wrote: > Hello Dear Spark Community! > > The hyper-popularity of the Apache Spark made it a de-facto choice for many > projects which need some sort of data processing