Re: about memory size for loading file

2022-01-13 Thread frakass

for this case i have 3 partitions, each process 3.333 GB data, am i right?


On 2022/1/14 2:20, Sonal Goyal wrote:

No it should not. The file would be partitioned and read across each node.

On Fri, 14 Jan 2022 at 11:48 AM, frakass > wrote:


Hello list

Given the case I have a file whose size is 10GB. The ram of total
cluster is 24GB, three nodes. So the local node has only 8GB.
If I load this file into Spark as a RDD via sc.textFile interface, will
this operation run into "out of memory" issue?

Thank you.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


--
Cheers,
Sonal
https://github.com/zinggAI/zingg 



-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: about memory size for loading file

2022-01-13 Thread Sonal Goyal
No it should not. The file would be partitioned and read across each node.

On Fri, 14 Jan 2022 at 11:48 AM, frakass  wrote:

> Hello list
>
> Given the case I have a file whose size is 10GB. The ram of total
> cluster is 24GB, three nodes. So the local node has only 8GB.
> If I load this file into Spark as a RDD via sc.textFile interface, will
> this operation run into "out of memory" issue?
>
> Thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Cheers,
Sonal
https://github.com/zinggAI/zingg


about memory size for loading file

2022-01-13 Thread frakass

Hello list

Given the case I have a file whose size is 10GB. The ram of total 
cluster is 24GB, three nodes. So the local node has only 8GB.
If I load this file into Spark as a RDD via sc.textFile interface, will 
this operation run into "out of memory" issue?


Thank you.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark on Oracle available as an Apache licensed open source repo

2022-01-13 Thread Harish Butani
Spark on Oracle is now available as an open source Apache licensed github
repo . Build and deploy it as an
extension jar in your Spark clusters.

Use it to combine Apache Spark programs with data in your existing Oracle
databases without expensive data copying or query time data movement.

The core capability is Optimizer extensions that collapse SQL operator
sub-graphs to an OraScan that executes equivalent SQL in Oracle. Physical
plan parallelism
can be
controlled to split Spark tasks to operate on Oracle data block ranges, or
on resultset pages or on table partitions.

We pushdown large parts of Spark SQL to Oracle, for example 95 of 99 TPCDS
queries are completely pushed to Oracle.


With Spark SQL macros
  you can
write custom Spark UDFs that get translated and pushed as Oracle SQL
expressions.

With DML pushdown 
inserts in Spark SQL get pushed as transactionally consistent
inserts/updates on Oracle tables.

See Quick Start Guide
  on how to
set up an Oracle free tier ADW instance, load it with TPCDS data and try
out the Spark on Oracle Demo
  on your Spark cluster.

More  details can be found in our blog
 and
the project
wiki. 

regards,
Harish Butani


Spark Unary Transformer Example

2022-01-13 Thread Alana Young
I am trying to run the Unary Transformer example provided by Spark 
(https://github.com/apache/spark/blob/v3.1.2/examples/src/main/scala/org/apache/spark/examples/ml/UnaryTransformerExample.scala
 
).
 I created a Zeppelin notebook and copied and pasted the example. The only 
change I made was to save and load the transformer to/from a predefined 
directory instead of creating a temporary directory. I am able to save the 
transformer, but am running into the following exception when loading the saved 
transformer (line 108):

java.lang.NoSuchMethodException: 
UnaryTransformerExample$MyTransformer.(java.lang.String)
  at java.base/java.lang.Class.getConstructor0(Class.java:3349)
  at java.base/java.lang.Class.getConstructor(Class.java:2151)
  at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
  at org.apache.spark.ml.util.MLReadable.load(ReadWrite.scala:355)
  at org.apache.spark.ml.util.MLReadable.load$(ReadWrite.scala:355)
  at UnaryTransformerExample$MyTransformer$.load(:84)
  at UnaryTransformerExample$.main(:109)
  ... 44 elided

Any feedback is appreciated. Thank you.

Additional Details

Spark Version: 3.1.2
Zeppelin Version: 0.9.0

Re: Does Spark 3.1.2/3.2 support log4j 2.17.1+, and how? your target release day for Spark3.3?

2022-01-13 Thread Sean Owen
Yes, Spark does not use the SocketServer mentioned in CVE-2019-17571,
however, so is not affected.
3.3.0 would probably be out in a couple months.

On Thu, Jan 13, 2022 at 3:14 AM Juan Liu  wrote:

> We are informed that CVE-2021-4104 is not only problem with Log4J 1.x.
> There is one more CVE-2019-17571, and as Apache announced EOL in 2015, so
> Spark 3.3.0 will be very expected. Do you think middle 2022 is a reasonable
> time for Spark 3.3.0 release?
>
> *Juan Liu (刘娟) **PMP**®*
> Release Management, Watson Health, China Development Lab
> Email: liuj...@cn.ibm.com
> Phone: 86-10-82452506
>
>
>
>