Re: Why Spark generates Java code and not Scala?

2019-11-11 Thread Marcin Tustin
ide your organization.* > -- > If you look inside of the generation we generate java code and compile it > with Janino. For interested folks the conversation moved over to the dev@ > list > > On Sat, Nov 9, 2019 at 10:37 AM Marcin Tustin > wrot

Re: Why Spark generates Java code and not Scala?

2019-11-09 Thread Marcin Tustin
What do you mean by this? Spark is written in a combination of Scala and Java, and then compiled to Java Byte Code, as is typical for both Scala and Java. If there's additional byte code generation happening, it's java byte code, because the platform runs on the JVM. On Sat, Nov 9, 2019 at 12:47

Re: Collecting large dataset

2019-09-05 Thread Marcin Tustin
Stop using collect for this purpose. Either continue your further processing in spark (maybe you need to use streaming), or sink the data to something that can accept the data (gcs/s3/azure storage/redshift/elasticsearch/whatever), and have further processing read from that sink. On Thu, Sep 5,

Re: How to combine all rows into a single row in DataFrame

2019-08-19 Thread Marcin Tustin
It sounds like you want to aggregate your rows in some way. I actually just wrote a blog post about that topic: https://medium.com/@albamus/spark-aggregating-your-data-the-fast-way-e37b53314fad On Mon, Aug 19, 2019 at 4:24 PM Rishikesh Gawade wrote: > *This Message originated outside your