Re: PySpark error java.lang.IllegalArgumentException

2023-07-10 Thread elango vaidyanathan
Finally I was able to solve this issue by setting this conf. "spark.driver.extraJavaOptions=-Dorg.xerial.snappy.tempdir=/my_user/temp_ folder" Thanks all! On Sat, 8 Jul 2023 at 3:45 AM, Brian Huynh wrote: > Hi Khalid, > > Elango mentioned the file is working fine in our another environment

Re: PySpark error java.lang.IllegalArgumentException

2023-07-07 Thread Brian Huynh
Hi Khalid,Elango mentioned the file is working fine in our another environment with the same driver and executor memoryBrianOn Jul 7, 2023, at 10:18 AM, Khalid Mammadov wrote:Perhaps that parquet file is corrupted or got that is in that folder?To check, try to read that file with pandas or other

Re: PySpark error java.lang.IllegalArgumentException

2023-07-07 Thread Khalid Mammadov
Perhaps that parquet file is corrupted or got that is in that folder? To check, try to read that file with pandas or other tools to see if you can read without Spark. On Wed, 5 Jul 2023, 07:25 elango vaidyanathan, wrote: > > Hi team, > > Any updates on this below issue > > On Mon, 3 Jul 2023 at

Re: PySpark error java.lang.IllegalArgumentException

2023-07-05 Thread elango vaidyanathan
Hi team, Any updates on this below issue On Mon, 3 Jul 2023 at 6:18 PM, elango vaidyanathan wrote: > > > Hi all, > > I am reading a parquet file like this and it gives > java.lang.IllegalArgumentException. > However i can work with other parquet files (such as nyc taxi parquet > files)

PySpark error java.lang.IllegalArgumentException

2023-07-03 Thread elango vaidyanathan
Hi all, I am reading a parquet file like this and it gives java.lang.IllegalArgumentException. However i can work with other parquet files (such as nyc taxi parquet files) without any issue. I have copied the full error log as well. Can you please check once and let me know how to fix this?

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
So I think now that my problem is Spark-related after all. It looks like my bootstrap script installs SciPy just fine in a regular environment, but somehow interaction with PySpark breaks it. On Fri, Jan 6, 2023 at 12:39 PM Bjørn Jørgensen wrote: > Create a Dockerfile > > FROM fedora > > RUN

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Bjørn Jørgensen
Create a Dockerfile FROM fedora RUN sudo yum install -y python3-devel RUN sudo pip3 install -U Cython && \ sudo pip3 install -U pybind11 && \ sudo pip3 install -U pythran && \ sudo pip3 install -U numpy && \ sudo pip3 install -U scipy docker build --pull --rm -f "Dockerfile" -t

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Mich Talebzadeh
https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
Thank you for the link. I already tried most of what was suggested there, but without success. On Fri, Jan 6, 2023 at 11:35 AM Bjørn Jørgensen wrote: > > > > https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp > > > > > fre.

Re: [PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Bjørn Jørgensen
https://stackoverflow.com/questions/66060487/valueerror-numpy-ndarray-size-changed-may-indicate-binary-incompatibility-exp fre. 6. jan. 2023, 16:01 skrev Oliver Ruebenacker < oliv...@broadinstitute.org>: > > Hello, > > I'm trying to install SciPy using a bootstrap script and then use

[PySpark] Error using SciPy: ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject

2023-01-06 Thread Oliver Ruebenacker
Hello, I'm trying to install SciPy using a bootstrap script and then use it to calculate a new field in a dataframe, running on AWS EMR. Although the SciPy website states that only NumPy is needed, when I tried to install SciPy using pip, pip kept failing, complaining about missing

Pyspark error when converting string to timestamp in map function

2018-08-17 Thread Keith Chapman
Hi all, I'm trying to create a dataframe enforcing a schema so that I can write it to a parquet file. The schema has timestamps and I get an error with pyspark. The following is a snippet of code that exhibits the problem, df = sqlctx.range(1000) schema = StructType([StructField('a',

Re: Pandas UDF for PySpark error. Big Dataset

2018-05-29 Thread Bryan Cutler
Can you share some of the code used, or at least the pandas_udf plus the stacktrace? Also does decreasing your dataset size fix the oom? On Mon, May 28, 2018, 4:22 PM Traku traku wrote: > Hi. > > I'm trying to use the new feature but I can't use it with a big dataset > (about 5 million rows).

Pandas UDF for PySpark error. Big Dataset

2018-05-28 Thread Traku traku
Hi. I'm trying to use the new feature but I can't use it with a big dataset (about 5 million rows). I tried increasing executor memory, driver memory, partition number, but any solution can help me to solve the problem. One of the executor task increase the shufle memory until fails. Error is

Re: Pyspark Error: Unable to read a hive table with transactional property set as 'True'

2018-03-02 Thread ayan guha
Hi Couple of questions: 1. It seems the error is due to number format: Caused by: java.util.concurrent.ExecutionException: java.lang.NumberFormatException: For input string: "0003024_" at java.util.concurrent.FutureTask.report(FutureTask.java:122) at

Pyspark Error: Unable to read a hive table with transactional property set as 'True'

2018-03-02 Thread Debabrata Ghosh
Hi All, Greetings ! I needed some help to read a Hive table via Pyspark for which the transactional property is set to 'True' (In other words ACID property is enabled). Following is the entire stacktrace and the description of the hive table. Would you please be able to help

Fwd: pyspark: Error when training a GMM with an initial GaussianMixtureModel

2015-11-25 Thread Guillaume Maze
Hi all, We're trying to train a Gaussian Mixture Model (GMM) with a specified initial model. Doc 1.5.1 says we should use a GaussianMixtureModel object as input for the "initialModel" parameter to the GaussianMixture.train method. Before creating our own initial model (the plan is to use a Kmean

Re: Pyspark: "Error: No main class set in JAR; please specify one with --class"

2015-10-01 Thread Marcelo Vanzin
1.3-jar-with-dependencies.jar" > > I got the error "Error: No main class set in JAR; please specify one with > --class". > > How do I specify the class for just the second JAR? > > > > -- > View this message in context: > http://apache-spark-user-list.10015

Re: Pyspark: "Error: No main class set in JAR; please specify one with --class"

2015-10-01 Thread Ted Yu
second JAR? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-No-main-class-set-in-JAR-please-specify-one-with-class-tp24900.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >

Pyspark: "Error: No main class set in JAR; please specify one with --class"

2015-10-01 Thread YaoPau
specify one with --class". How do I specify the class for just the second JAR? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-No-main-class-set-in-JAR-please-specify-one-with-class-tp24900.html Sent from the Apache Spark User List mail

pyspark error with zip

2015-03-31 Thread Charles Hayden
? The following program fails in the zip step. x = sc.parallelize([1, 2, 3, 1, 2, 3]) y = sc.parallelize([1, 2, 3]) z = x.distinct() print x.zip(y).collect() The error that is produced depends on whether multiple partitions have been specified or not. I understand that the two RDDs [must]

Re: Pyspark Error

2014-11-18 Thread Shannon Quinn
My best guess would be a networking issue--it looks like the Python socket library isn't able to connect to whatever hostname you're providing Spark in the configuration. On 11/18/14 9:10 AM, amin mohebbi wrote: Hi there, *I have already downloaded Pre-built spark-1.1.0, I want to run

Re: Pyspark Error

2014-11-18 Thread Davies Liu
It seems that `localhost` can not be resolved in your machines, I had filed https://issues.apache.org/jira/browse/SPARK-4475 to track it. On Tue, Nov 18, 2014 at 6:10 AM, amin mohebbi aminn_...@yahoo.com.invalid wrote: Hi there, I have already downloaded Pre-built spark-1.1.0, I want to run

Re: Pyspark Error when broadcast numpy array

2014-11-12 Thread bliuab
) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Pyspark Error when broadcast numpy array

2014-11-11 Thread bliuab
) a = sc.broadcast(vec) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Pyspark Error when broadcast numpy array

2014-11-11 Thread Davies Liu
.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Re: Pyspark Error when broadcast numpy array

2014-11-11 Thread bliuab
environment sc = SparkContext(conf=conf, batchSize=1) vec = np.random.rand(3500) a = sc.broadcast(vec) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from the Apache Spark

Re: Pyspark Error when broadcast numpy array

2014-11-11 Thread Davies Liu
) vec = np.random.rand(3500) a = sc.broadcast(vec) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Pyspark Error when broadcast numpy array

2014-11-11 Thread bliuab
environment sc = SparkContext(conf=conf, batchSize=1) vec = np.random.rand(3500) a = sc.broadcast(vec) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Pyspark-Error-when-broadcast-numpy-array-tp18662.html Sent from

PySpark Error on Windows with sc.wholeTextFiles

2014-10-16 Thread Griffiths, Michael (NYC-RPM)
Hi, I'm running into an error on Windows (x64, 8.1) running Spark 1.1.0 (pre-builet for Hadoop 2.4: spark-1.1.0-bin-hadoop2.4.tgzhttp://d3kbcqa49mib13.cloudfront.net/spark-1.1.0-bin-hadoop2.4.tgz) with Java SE Version 8 Update 20 (build 1.8.0_20-b26); just getting started with Spark. When

Re: PySpark Error on Windows with sc.wholeTextFiles

2014-10-16 Thread Davies Liu
It's a bug, could you file a JIRA for this? Thanks! Davies On Thu, Oct 16, 2014 at 8:28 AM, Griffiths, Michael (NYC-RPM) michael.griffi...@reprisemedia.com wrote: Hi, I’m running into an error on Windows (x64, 8.1) running Spark 1.1.0 (pre-builet for Hadoop 2.4: