Re: protobuf data as input to spark streaming

2022-04-08 Thread Kiran Biswal
Hello Stelios Just a gentle follow up if you can share any sample code/repo Regards Kiran On Wed, Apr 6, 2022 at 3:19 PM Kiran Biswal wrote: > Hello Stelios > > Preferred language would have been Scala or pyspark but if Java is proven > I am open to using it > > Any sample reference or

Grabbing the current MemoryManager in a plugin

2022-04-08 Thread Andrew Melo
Hello, I've implemented support for my DSv2 plugin to back its storage with ArrowColumnVectors, which necessarily means using off-heap memory. Is it possible to somehow grab either a reference to the current MemoryManager so that the off-heap memory usage is properly accounted for and to prevent

Fwd: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, > Am 08.04.2022 um 17:34 schrieb Lalwani, Jayesh : > > What format are you writing the file to? Are you planning on your own custom > format, or are you planning to use standard formats like parquet? I’m dealing with geo-spatial data (Apache Sedona), so I have got a data frame with

Fwd: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, This sound great for the first step. > Am 08.04.2022 um 17:25 schrieb Sean Owen >: > > You can certainly write that UDF. You get a column in a DataFrame of > array type and you can write that to any appropriate format. > What do you mean by continuous byte

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Lalwani, Jayesh
What format are you writing the file to? Are you planning on your own custom format, or are you planning to use standard formats like parquet? Note that Spark can write numeric data in most standard formats. If you use custom format instead, whoever consumes the data needs to parse your data.

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Sean Owen
That's for strings, but still doesn't address what is desired w.r.t. writing a binary column On Fri, Apr 8, 2022 at 10:31 AM Bjørn Jørgensen wrote: > In the New spark 3.3 there Will be an sql function > https://github.com/apache/spark/commit/25dd4254fed71923731fd59838875c0dd1ff665a > hope this

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Bjørn Jørgensen
In the New spark 3.3 there Will be an sql function https://github.com/apache/spark/commit/25dd4254fed71923731fd59838875c0dd1ff665a hope this can help you. fre. 8. apr. 2022, 17:14 skrev Philipp Kraus < philipp.kraus.flashp...@gmail.com>: > Hello, > > I have got a data frame with numerical data

Re: Spark 3.0.1 and spark 3.2 compatibility

2022-04-08 Thread Darcy Shen
> Do I need to recompile my application with 3.2 dependencies or application > compiled with 3.0.1 will work fine on 3.2 ? Yes. And here is How to compile conditionally for Apache Spark 3.1.x and Apache Spark >= 3.2.x object XYZ {

Re: Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Sean Owen
You can certainly write that UDF. You get a column in a DataFrame of array type and you can write that to any appropriate format. What do you mean by continuous byte stream? something besides, say, parquet files holding the byte arrays? On Fri, Apr 8, 2022 at 10:14 AM Philipp Kraus <

Spark Write BinaryType Column as continues file to S3

2022-04-08 Thread Philipp Kraus
Hello, I have got a data frame with numerical data in Spark 3.1.1 (Java) which should be converted to a binary file. My idea is that I create a udf function that generates a byte array based on the numerical values, so I can apply this function on each row of the data frame and get than a new

Re: Aggregate over a column: the proper way to do

2022-04-08 Thread Sean Owen
Dataset.count() returns one value directly? On Thu, Apr 7, 2022 at 11:25 PM sam smith wrote: > My bad, yes of course that! still i don't like the .. > select("count(myCol)") .. part in my line is there any replacement to that ? > > Le ven. 8 avr. 2022 à 06:13, Sean Owen a écrit : > >> Just do

Re: Spark 3.0.1 and spark 3.2 compatibility

2022-04-08 Thread Gourav Sengupta
Hi, absolutely agree with Sean, besides that please see the release notes as well for SPARK versions, they do mention about any issues around compatibility Regards, Gourav On Thu, Apr 7, 2022 at 6:32 PM Sean Owen wrote: > (Don't cross post please) > Generally you definitely want to compile and