date:20160821

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty

Thanks Krishna for your response. Features in the training set has more categories than test set so when vectorAssembler is used these numbers are usually different and I believe it is as expected right ? Test dataset usually will not have so many categories in their features as Train is the

Re: Entire XML data as one of the column in DataFrame

2016-08-21 Thread Hyukjin Kwon

I can't say this is the best way to do so but my instant thought is as below: Create two df sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY, s"") sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY, s"") sc.hadoopConfiguration.set(XmlInputFormat.ENCODING_KEY, "UTF-8") val strXmlDf =

Hi,

2016-08-21 Thread Xi Shen

I found there are several .conf files in the conf directory, which one is used as the default one when I click the "new" button on the notebook homepage? I want to edit the default profile configuration so all my notebooks are created with custom settings. -- Thanks, David S.

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread Krishna Sankar

Hi, Just after I sent the mail, I realized that the error might be with the training-dataset not the test-dataset. 1. it might be that you are feeding the full Y vector for training. 2. Which could mean, you are using ~50-50 training-test split. 3. Take a good look at the code that

Re: Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread Krishna Sankar

Hi, Looks like the test-dataset has different sizes for X & Y. Possible steps: 1. What is the test-data-size ? - If it is 15,909, check the prediction variable vector - it is now 29,471, should be 15,909 - If you expect it to be 29,471, then the X Matrix is not right.

Vector size mismatch in logistic regression - Spark ML 2.0

2016-08-21 Thread janardhan shetty

Hi, I have built the logistic regression model using training-dataset. When I am predicting on a test-dataset, it is throwing the below error of size mismatch. Steps done: 1. String indexers on categorical features. 2. One hot encoding on these indexed features. Any help is appreciated to

Re: submitting spark job with kerberized Hadoop issue

2016-08-21 Thread Aneela Saleem

Any update on this? On Tuesday, 16 August 2016, Aneela Saleem wrote: > Thanks Steve, > > I went through this but still not able to fix the issue > > On Mon, Aug 15, 2016 at 2:01 AM, Steve Loughran

Re: Accessing HBase through Spark with Security enabled

2016-08-21 Thread Aneela Saleem

Any update on this? On Tuesday, 16 August 2016, Aneela Saleem wrote: > Thanks Steve, > > I have gone through it's documentation, i did not get any idea how to > install it. Can you help me? > > On Mon, Aug 15, 2016 at 4:23 PM, Steve Loughran

RE: Flattening XML in a DataFrame

2016-08-21 Thread srikanth.jella

Hi Hyukjin, I have created the below issue. https://github.com/databricks/spark-xml/issues/155 Sent from Mail for Windows 10 From: Hyukjin Kwon

Entire XML data as one of the column in DataFrame

2016-08-21 Thread srikanth.jella

Hello Experts, I’m using spark-xml package which is automatically inferring my schema and creating a DataFrame. I’m extracting few fields like id, name (which are unique) from below xml, but my requirement is to store entire XML in one of the column as well. I’m writing this data to AVRO

Re: Spark Streaming application failing with Token issue

2016-08-21 Thread Mich Talebzadeh

Hi Kamesh, The message you are getting after 7 days: PriviledgedActionException as:sys_bio_replicator (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache. hadoop.security.token.SecretManager$InvalidToken): Token has expired Sounds like an IPC issue with Kerberos

Re: How to continuous update or refresh RandomForestClassificationModel

2016-08-21 Thread Jacek Laskowski

Hi, That's my understanding -- you need to fit another model given the training data. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Aug 19, 2016 at

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-21 Thread Everett Anderson

On Sun, Aug 21, 2016 at 3:08 AM, Bedrytski Aliaksandr wrote: > Hi, > > we share the same spark/hive context between tests (executed in > parallel), so the main problem is that the temporary tables are > overwritten each time they are created, this may create race conditions >

Re: Reporting errors from spark sql

2016-08-21 Thread Jacek Laskowski

Hi, See https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L65 to learn how Spark SQL parses SQL texts. It could give you a way out. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache

Re: Unsubscribe

2016-08-21 Thread Rahul Palamuttam

Hi sudhanshu, Try user-unsubscribe.spark.apache.org - Rahul P Sent from my iPhone > On Aug 21, 2016, at 9:19 AM, Sudhanshu Janghel > wrote: > > Hello, > > I wish to unsubscribe from the channel. > > KIND REGARDS, > SUDHANSHU

Unsubscribe

2016-08-21 Thread Sudhanshu Janghel

Hello, I wish to unsubscribe from the channel. KIND REGARDS, SUDHANSHU

Re: Spark Streaming application failing with Token issue

2016-08-21 Thread Jacek Laskowski

Hi Kamesh, I believe your only option is to re-start your application every 7 days (perhaps you need to enable checkpointing). See https://github.com/apache/spark/commit/ab648c0004cfb20d53554ab333dd2d198cb94ffa for a change with automatic security token renewal. Pozdrawiam, Jacek Laskowski

Re: Best way to read XML data from RDD

2016-08-21 Thread Darin McBeath

Another option would be to look at spark-xml-utils. We use this extensively in the manipulation of our XML content. https://github.com/elsevierlabs-os/spark-xml-utils There are quite a few examples. Depending on your preference (and what you want to do), you could use xpath, xquery, or

Re: Dataframe corrupted when sqlContext.read.json on a Gzipped file that contains more than one file

2016-08-21 Thread Sean Owen

You are attempting to read a tar file. That won't work. A compressed JSON file would. On Sun, Aug 21, 2016, 12:52 Chua Jie Sheng wrote: > Hi Spark user list! > > I have been encountering corrupted records when reading Gzipped files that > contains more than one file. > >

Dataframe corrupted when sqlContext.read.json on a Gzipped file that contains more than one file

2016-08-21 Thread Chua Jie Sheng

Hi Spark user list! I have been encountering corrupted records when reading Gzipped files that contains more than one file. Example: I have two .json file, [a.json, b.json] Each have multiple records (one line, one record). I tar both of them together on Mac OS X, 10.11.6 bsdtar 2.8.3 -

Re: Best way to read XML data from RDD

2016-08-21 Thread Hyukjin Kwon

Hi Diwakar, Spark XML library can take RDD as source. ``` val df = new XmlReader() .withRowTag("book") .xmlRdd(sqlContext, rdd) ``` If performance is critical, I would also recommend to take care of creation and destruction of the parser. If the parser is not serializble, then you can do

Re: Plans for improved Spark DataFrame/Dataset unit testing?

2016-08-21 Thread Bedrytski Aliaksandr

Hi, we share the same spark/hive context between tests (executed in parallel), so the main problem is that the temporary tables are overwritten each time they are created, this may create race conditions as these tempTables may be seen as global mutable shared state. So each time we create a

DCOS - s3

2016-08-21 Thread Martin Somers

I having trouble loading data from an s3 repo Currently DCOS is running spark 2 so I not sure if there is a modifcation to code with the upgrade my code atm looks like this sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "xxx") sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "xxx")

Re: Vector size mismatch in logistic regression - Spark ML 2.0

Re: Entire XML data as one of the column in DataFrame

Hi,

Re: Vector size mismatch in logistic regression - Spark ML 2.0

Re: Vector size mismatch in logistic regression - Spark ML 2.0

Vector size mismatch in logistic regression - Spark ML 2.0

Re: submitting spark job with kerberized Hadoop issue

Re: Accessing HBase through Spark with Security enabled

RE: Flattening XML in a DataFrame

Entire XML data as one of the column in DataFrame

Re: Spark Streaming application failing with Token issue

Re: How to continuous update or refresh RandomForestClassificationModel

Re: Plans for improved Spark DataFrame/Dataset unit testing?

Re: Reporting errors from spark sql

Re: Unsubscribe

Unsubscribe

Re: Spark Streaming application failing with Token issue

Re: Best way to read XML data from RDD

Re: Dataframe corrupted when sqlContext.read.json on a Gzipped file that contains more than one file

Dataframe corrupted when sqlContext.read.json on a Gzipped file that contains more than one file

Re: Best way to read XML data from RDD

Re: Plans for improved Spark DataFrame/Dataset unit testing?

DCOS - s3

23 matches

Site Navigation

Mail list logo

Footer information