Re: Testing another Dataset after ML training

2017-07-12 Thread Riccardo Ferrari
Hi Michael, I don't see any attachment, not sure you can attach files though On Tue, Jul 11, 2017 at 10:44 PM, Michael C. Kunkel wrote: > Greetings, > > Thanks for the communication. > > I attached the entire stacktrace in which was output to the screen. > I tried to

Re: Testing another Dataset after ML training

2017-07-12 Thread Riccardo Ferrari
Hi Michael, I think I found you posting on SO: https://stackoverflow.com/questions/45041677/java-spark-training-on-new-data-with-datasetrow-from-csv-file The exception trace there is quite different from what I read here, and indeed is self-explanatory: ... Caused by:

Re: Testing another Dataset after ML training

2017-07-12 Thread Michael C. Kunkel
Greetings Riccardo, That is indeed my post. That is my second attempt at getting this problem to work. I am not sure if the vector size are different as I know the "unknown" data is just a blind copy of 3 of the used inputs for the training data. I will pursue this avenue more. Thanks for

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Hyukjin Kwon
Cool! 2017-07-13 9:43 GMT+09:00 Denny Lee : > This is amazingly awesome! :) > > On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com > wrote: > >> That's great! >> >> >> >> On 12 July 2017 at 12:41, Felix Cheung wrote: >>

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Denny Lee
This is amazingly awesome! :) On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com wrote: > That's great! > > > > On 12 July 2017 at 12:41, Felix Cheung wrote: > >> Awesome! Congrats!! >> >> -- >> *From:*

CVE-2017-7678 Apache Spark XSS web UI MHTML vulnerability

2017-07-12 Thread Sean Owen
Severity: Low Vendor: The Apache Software Foundation Versions Affected: Versions of Apache Spark before 2.2.0 Description: It is possible for an attacker to take advantage of a user's trust in the server to trick them into visiting a link that points to a shared Spark cluster and submits data

Re: Testing another Dataset after ML training

2017-07-12 Thread Kunkel, Michael C.
Greetings The attachment I meant to refer to was the posting in the initial email on the email list. BR MK Michael C. Kunkel, USMC, PhD Forschungszentrum Jülich Nuclear Physics Institute and Juelich Center for Hadron Physics Experimental Hadron Structure

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread lucas.g...@gmail.com
That's great! On 12 July 2017 at 12:41, Felix Cheung wrote: > Awesome! Congrats!! > > -- > *From:* holden.ka...@gmail.com on behalf of > Holden Karau > *Sent:* Wednesday, July 12, 2017

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Jeff Zhang
Awesome ! Hyukjin Kwon 于2017年7月13日周四 上午8:48写道: > Cool! > > 2017-07-13 9:43 GMT+09:00 Denny Lee : > >> This is amazingly awesome! :) >> >> On Wed, Jul 12, 2017 at 13:23 lucas.g...@gmail.com >> wrote: >> >>> That's great! >>> >>>

[ML] Performance issues with GBTRegressor

2017-07-12 Thread OBones
Hello all, I'm using Spark for medium to large datasets regression analysis and its performance are very great when using random forest or decision trees. Continuing my experimentation, I started using GBTRegressor and am finding it extremely slow when compared to R while both other methods

Re: Spark, S3A, and 503 SlowDown / rate limit issues

2017-07-12 Thread Steve Loughran
On 10 Jul 2017, at 21:57, Everett Anderson > wrote: Hey, Thanks for the responses, guys! On Thu, Jul 6, 2017 at 7:08 AM, Steve Loughran > wrote: On 5 Jul 2017, at 14:40, Vadim Semenov

Re: With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Felix Cheung
Awesome! Congrats!! From: holden.ka...@gmail.com on behalf of Holden Karau Sent: Wednesday, July 12, 2017 12:26:00 PM To: user@spark.apache.org Subject: With 2.2.0 PySpark is now available for pip install from PyPI

DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

2017-07-12 Thread Sumona Routh
Hi there, I'm trying to read a list of paths from S3 into a dataframe for a window of time using the following: sparkSession.read.parquet(listOfPaths:_*) In some cases, the path may not be there because there is no data, which is an acceptable scenario. However, Spark throws an

With 2.2.0 PySpark is now available for pip install from PyPI :)

2017-07-12 Thread Holden Karau
Hi wonderful Python + Spark folks, I'm excited to announce that with Spark 2.2.0 we finally have PySpark published on PyPI (see https://pypi.python.org/pypi/pyspark / https://twitter.com/holdenkarau/status/885207416173756417). This has been a long time coming (previous releases included pip

Implementing Dynamic Sampling in a Spark Streaming Application

2017-07-12 Thread N B
Hi all, Spark has had a backpressure implementation since 1.5 that helps to stabilize a Spark Streaming application in terms of keeping the processing time/batch under control and less than the batch interval. This implementation leaves excess records in the source (Kafka, Flume etc) and they get

Re: DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist

2017-07-12 Thread Yong Zhang
Can't you just catch that exception and return an empty dataframe? Yong From: Sumona Routh Sent: Wednesday, July 12, 2017 4:36 PM To: user Subject: DataFrameReader read from S3 org.apache.spark.sql.AnalysisException: Path does not exist