Re: about memory size for loading file

2022-01-13 Thread frakass
for this case i have 3 partitions, each process 3.333 GB data, am i right? On 2022/1/14 2:20, Sonal Goyal wrote: No it should not. The file would be partitioned and read across each node. On Fri, 14 Jan 2022 at 11:48 AM, frakass > wrote: Hello list

Re: about memory size for loading file

2022-01-13 Thread Sonal Goyal
No it should not. The file would be partitioned and read across each node. On Fri, 14 Jan 2022 at 11:48 AM, frakass wrote: > Hello list > > Given the case I have a file whose size is 10GB. The ram of total > cluster is 24GB, three nodes. So the local node has only 8GB. > If I load this file

about memory size for loading file

2022-01-13 Thread frakass
Hello list Given the case I have a file whose size is 10GB. The ram of total cluster is 24GB, three nodes. So the local node has only 8GB. If I load this file into Spark as a RDD via sc.textFile interface, will this operation run into "out of memory" issue? Thank you.

Spark on Oracle available as an Apache licensed open source repo

2022-01-13 Thread Harish Butani
Spark on Oracle is now available as an open source Apache licensed github repo . Build and deploy it as an extension jar in your Spark clusters. Use it to combine Apache Spark programs with data in your existing Oracle databases without expensive data

Spark Unary Transformer Example

2022-01-13 Thread Alana Young
I am trying to run the Unary Transformer example provided by Spark (https://github.com/apache/spark/blob/v3.1.2/examples/src/main/scala/org/apache/spark/examples/ml/UnaryTransformerExample.scala

Re: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3?

2022-01-13 Thread Sean Owen
Yes, Spark does not use the SocketServer mentioned in CVE-2019-17571, however, so is not affected. 3.3.0 would probably be out in a couple months. On Thu, Jan 13, 2022 at 3:14 AM Juan Liu wrote: > We are informed that CVE-2021-4104 is not only problem with Log4J 1.x. > There is one more