date:20150117


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281564#comment-14281564
 ] 

Grzegorz Dubicki commented on SPARK-5298:
-

Btw: I used the fork because I was misguided by Github which says: 
mesos/spark-ec2 [is] forked from shivaram/spark-ec2 on 
https://github.com/mesos/spark-ec2 - I assumed that shivaram/spark2-ec2 is the 
source - newer offical version then..

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281535#comment-14281535
 ] 

Grzegorz Dubicki edited comment on SPARK-5298 at 1/17/15 8:53 PM:
--

Ad. 1. I am sorry, I have not noticed the warnings. I would not use unsupported 
instance if I would knew that. It would be nice if the script would ask me 
something like Not supported instance type. Continue anyway?...

But switching to m3.medium didn't help. Launch output still includes the 
ERROR: Unknown Spark version message. See it whole here: 
https://gist.github.com/grzegorz-dubicki/4959eb97f9b1ca8e00ad

And still there is actually no Spark on the master:
{noformat}
root@ip-172-31-47-137 ~]$ ls spark
conf  work
{noformat}

Trying to apply your suggestion no 2...


was (Author: grzegorz-dubicki):
Ad. 1. I am sorry, I have not noticed the warnings. I would not use unsupported 
instance if I would knew that. It would be nice if the script would ask me 
something like Not supported instance type. Continue anyway?...

But switching to m3.medium didn't help. Launch output still includes the 
ERROR: Unknown Spark version message. See it whole here: 
https://gist.github.com/grzegorz-dubicki/4959eb97f9b1ca8e00ad

And still there is actually no Spark on the master:
{noformat}
root@ip-172-31-47-137 ~]$ ls spark
conf  work
{noformat}

Trying to apply your suggestion no 2...

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281540#comment-14281540
 ] 

Nicholas Chammas commented on SPARK-5298:
-

Ah, I found the issue. You have an outdated fork of {{mesos/spark-ec2}}. See 
here: 
https://github.com/grzegorz-dubicki/spark-ec2/blob/b388d5b22462d4b5bfc9f021f160cd438c98f2c1/spark/init.sh#L98

Please re-fork the {{v4}} branch and try again. Correct version: 
https://github.com/mesos/spark-ec2/blob/c8b470929838132cae6f9872eeb459d7924f1978/spark/init.sh#L105

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5298) Spark not starting on EC2 using spark-ec2


 [ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas resolved SPARK-5298.
-
Resolution: Invalid

I'm resolving this as invalid. If you believe this is incorrect, please feel 
free to reopen with clarification.

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?


[ 
https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281511#comment-14281511
 ] 

Nicholas Chammas commented on SPARK-5299:
-

cc [~pwendell]

 Is http://www.apache.org/dist/spark/KEYS out of date?
 -

 Key: SPARK-5299
 URL: https://issues.apache.org/jira/browse/SPARK-5299
 Project: Spark
  Issue Type: Question
  Components: Deploy
Reporter: David Shaw

 The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to 
 match the keys used to sign the releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281514#comment-14281514
 ] 

Nicholas Chammas commented on SPARK-5298:
-

A few questions for you:

1. What happens if you try to launch on {{m3.medium}} instances? {{t2.micro}} 
is not fully supported by {{spark-ec2}}, as the warning hints at.

The error about Shark is harmless since Shark doesn't exist as of 1.2.0. This 
error won't show up anymore in 1.3.0.

The error about Spark is strange since you passed in the version correctly. 2. 
What happens if you launch without explicitly setting the version? 3. What 
happens if you launch into the {{us-east-1}} region?

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281535#comment-14281535
 ] 

Grzegorz Dubicki commented on SPARK-5298:
-

Ad. 1. I am sorry, I have not noticed the warnings. I would not use unsupported 
instance if I would knew that. It would be nice if the script would ask me 
something like Not supported instance type. Continue anyway?...

But switching to m3.medium didn't help. Launch output still includes the 
ERROR: Unknown Spark version message. See it whole here: 
https://gist.github.com/grzegorz-dubicki/4959eb97f9b1ca8e00ad

And still there is actually no Spark on the master:
{noformat}
root@ip-172-31-47-137 ~]$ ls spark
conf  work
{noformat}

Trying to apply your suggestion no 2...

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5019) Update GMM API to use MultivariateGaussian

2015-01-17 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281565#comment-14281565
 ] 

Joseph K. Bradley commented on SPARK-5019:
--

[~tgaloppo]  I'd recommend going ahead and submitting a PR if you have it 
prepared.  It will be good to finalize soon since the code freeze for the next 
release is scheduled to be at the end of this month.  Thanks for being patient!

 Update GMM API to use MultivariateGaussian
 --

 Key: SPARK-5019
 URL: https://issues.apache.org/jira/browse/SPARK-5019
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley
Priority: Blocker

 The GaussianMixtureModel API should expose MultivariateGaussian instances 
 instead of the means and covariances.  This should be fixed as soon as 
 possible to stabilize the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281557#comment-14281557
 ] 

Grzegorz Dubicki commented on SPARK-5298:
-

Ad. 2. No progress. I put the output in the same gist as previously as a new 
commit for a free diff.

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281557#comment-14281557
 ] 

Grzegorz Dubicki edited comment on SPARK-5298 at 1/17/15 9:15 PM:
--

EDIT: Thank you, I will try to update my fork.


was (Author: grzegorz-dubicki):
Ad. 2. No progress. I put the output in the same gist as previously as a new 
commit for a free diff.

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5298) Spark not starting on EC2 using spark-ec2


 [ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grzegorz Dubicki closed SPARK-5298.
---

Switching to mesos/spark-ec2 as a base of my fork helped. :)

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5019) Update GMM API to use MultivariateGaussian


[ 
https://issues.apache.org/jira/browse/SPARK-5019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281570#comment-14281570
 ] 

Apache Spark commented on SPARK-5019:
-

User 'tgaloppo' has created a pull request for this issue:
https://github.com/apache/spark/pull/4088

 Update GMM API to use MultivariateGaussian
 --

 Key: SPARK-5019
 URL: https://issues.apache.org/jira/browse/SPARK-5019
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.2.0
Reporter: Joseph K. Bradley
Priority: Blocker

 The GaussianMixtureModel API should expose MultivariateGaussian instances 
 instead of the means and covariances.  This should be fixed as soon as 
 possible to stabilize the API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5298) Spark not starting on EC2 using spark-ec2


[ 
https://issues.apache.org/jira/browse/SPARK-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281575#comment-14281575
 ] 

Nicholas Chammas commented on SPARK-5298:
-

Yes, {{mesos/spark-ec2}} is the official repo. You'll see that {{spark-ec2}} 
from the main Spark repository points to it.

 Spark not starting on EC2 using spark-ec2
 -

 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
 Environment: I use Spark 1.2.0 + this PR 
 https://github.com/mesos/spark-ec2/pull/76 from my fork 
 https://github.com/grzegorz-dubicki/spark and v4 Spark EC2 script with the 
 same fix from https://github.com/grzegorz-dubicki/spark-ec2
Reporter: Grzegorz Dubicki

 Spark doesn't start after creating it with:
 {noformat}
 ./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
 --spark-version=1.2.0 launch test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)
 ..or after stopping the instances on EC2 via AWS Console and starting the 
 cluster with:
 {noformat}
 ./spark-ec2 -k * -i * --region=eu-west-1 start test2
 {noformat}
 (Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)
 Please note these errors in launch output:
 {noformat}
 ~/spark-ec2
 Initializing spark
 ~ ~/spark-ec2
 ERROR: Unknown Spark version
 Initializing shark
 ~ ~/spark-ec2 ~/spark-ec2
 ERROR: Unknown Shark version
 {noformat}
 ..and then these in start output:
 {noformat}
 ./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
 takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie 
 ma takiego pliku ani katalogu
 ./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie 
 ma takiego pliku ani katalogu
 {noformat}
 (the error message is No such file or directory, in Polish)
 It seems to be related with 
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
  - I also have almost empty Spark and Shark dirs on the master of test2 
 cluster:
 {noformat}
 root@ip-172-31-7-179 ~]$ ls spark
 conf  work
 root@ip-172-31-7-179 ~]$ ls shark/
 conf
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5301) Add missing linear algebra utilities to IndexedRowMatrix and CoordinateMatrix

2015-01-17 Thread Reza Zadeh (JIRA)

Reza Zadeh created SPARK-5301:
-

 Summary: Add missing linear algebra utilities to IndexedRowMatrix 
and CoordinateMatrix
 Key: SPARK-5301
 URL: https://issues.apache.org/jira/browse/SPARK-5301
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5301) Add missing linear algebra utilities to IndexedRowMatrix and CoordinateMatrix

2015-01-17 Thread Reza Zadeh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reza Zadeh updated SPARK-5301:
--
Description: 
1) Transpose is missing from CoordinateMatrix (this is cheap to compute, so it 
should be there)
2) IndexedRowMatrix should be convertable to CoordinateMatrix (conversion 
method to be added)

 Add missing linear algebra utilities to IndexedRowMatrix and CoordinateMatrix
 -

 Key: SPARK-5301
 URL: https://issues.apache.org/jira/browse/SPARK-5301
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh

 1) Transpose is missing from CoordinateMatrix (this is cheap to compute, so 
 it should be there)
 2) IndexedRowMatrix should be convertable to CoordinateMatrix (conversion 
 method to be added)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5301) Add missing linear algebra utilities to IndexedRowMatrix and CoordinateMatrix


[ 
https://issues.apache.org/jira/browse/SPARK-5301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281577#comment-14281577
 ] 

Apache Spark commented on SPARK-5301:
-

User 'rezazadeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/4089

 Add missing linear algebra utilities to IndexedRowMatrix and CoordinateMatrix
 -

 Key: SPARK-5301
 URL: https://issues.apache.org/jira/browse/SPARK-5301
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh

 1) Transpose is missing from CoordinateMatrix (this is cheap to compute, so 
 it should be there)
 2) IndexedRowMatrix should be convertable to CoordinateMatrix (conversion 
 method to be added)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5300) Spark loads file partitions in inconsistent order on native filesystems

2015-01-17 Thread Ewan Higgs (JIRA)

Ewan Higgs created SPARK-5300:
-

 Summary: Spark loads file partitions in inconsistent order on 
native filesystems
 Key: SPARK-5300
 URL: https://issues.apache.org/jira/browse/SPARK-5300
 Project: Spark
  Issue Type: Bug
  Components: Input/Output
Affects Versions: 1.2.0, 1.1.0
 Environment: Linux, EXT4, for example.
Reporter: Ewan Higgs


Discussed on user list in April 2014:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html

And on dev list January 2015:
http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-order-guarantees-td10142.html

When using a file system which isn't HDFS, file partitions ('part-0, 
part-1', etc.) are not guaranteed to load in the same order. This means 
previously sorted RDDs will be loaded out of order. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4937) Adding optimization to simplify the filter condition

2015-01-17 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4937:
---
Description: 
Adding optimization to simplify the filter condition:
1  condition that can get the boolean result such as:
{code}
a  3  a  5   False
a  1 || a  0 True
{code}

2 Simplify And, Or condition, such as the sql (one of hive-testbench):
{code}
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#32'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity = 7 and l_quantity = 7 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#35'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity = 15 and l_quantity = 15 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#24'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity = 26 and l_quantity = 26 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
{code}
 Before optimized it is a CartesianProduct, in my locally test this sql hang 
and can not get result, after optimization the CartesianProduct replaced by 
ShuffledHashJoin, which only need 20+ seconds to run this sql.

  was:
Adding optimization to simplify the filter condition:
1  condition that can get the boolean result such as:
a  3  a  5   False
a  1 || a  0 True

2 Simplify And, Or condition, such as the sql (one of hive-testbench
):
select
sum(l_extendedprice* (1 - l_discount)) as revenue
from
lineitem,
part
where
(
p_partkey = l_partkey
and p_brand = 'Brand#32'
and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
and l_quantity = 7 and l_quantity = 7 + 10
and p_size between 1 and 5
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#35'
and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
and l_quantity = 15 and l_quantity = 15 + 10
and p_size between 1 and 10
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
)
or
(
p_partkey = l_partkey
and p_brand = 'Brand#24'
and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
and l_quantity = 26 and l_quantity = 26 + 10
and p_size between 1 and 15
and l_shipmode in ('AIR', 'AIR REG')
and l_shipinstruct = 'DELIVER IN PERSON'
);
 Before optimized it is a CartesianProduct, in my locally test this sql hang 
and can not get result, after optimization the CartesianProduct replaced by 
ShuffledHashJoin, which only need 20+ seconds to run this sql.


 Adding optimization to simplify the filter condition
 

 Key: SPARK-4937
 URL: https://issues.apache.org/jira/browse/SPARK-4937
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: wangfei
Assignee: Cheng Lian
 Fix For: 1.3.0


 Adding optimization to simplify the filter condition:
 1  condition that can get the boolean result such as:
 {code}
 a  3  a  5   False
 a  1 || a  0 True
 {code}
 2 Simplify And, Or condition, such as the sql (one of hive-testbench):
 {code}
 select
 sum(l_extendedprice* (1 - l_discount)) as revenue
 from
 lineitem,
 part
 where
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#32'
 and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
 and l_quantity = 7 and l_quantity = 7 + 10
 and p_size between 1 and 5
 and l_shipmode in ('AIR', 'AIR REG')
 and l_shipinstruct = 'DELIVER IN PERSON'
 )
 or
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#35'
 and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
 and l_quantity = 15 and l_quantity = 15 + 10
 and p_size between 1 and 10
 and l_shipmode in ('AIR', 'AIR REG')
 and l_shipinstruct = 'DELIVER IN PERSON'
 )
 or
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#24'
 and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
 and l_quantity = 26

[jira] [Created] (SPARK-5305) Using a field in a WHERE clause that is not in the schema does not throw an exception.

Corey J. Nolet created SPARK-5305:
-

 Summary: Using a field in a WHERE clause that is not in the schema 
does not throw an exception.
 Key: SPARK-5305
 URL: https://issues.apache.org/jira/browse/SPARK-5305
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Corey J. Nolet


Given a schema:

key1 = String
key2 = Integer

The following sql statement doesn't seem to throw an exception:

SELECT * FROM myTable WHERE doesntExist = 'val1'




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5302) Add support for SQLContext partition columns


 [ 
https://issues.apache.org/jira/browse/SPARK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Tiernay updated SPARK-5302:
---
Description: For {{SQLContext}} (not {{HiveContext}}) it would be very 
convenient to support a virtual column that maps to part of the the file path, 
similar to what is done in Hive for partitions. The API could allow the user to 
type the column using an appropriate {{DataType}} instance. This new field 
could be addressed in SQL statements much the same as is done in Hive. As a 
consequence, pruning of partitions could be possible when executing a query and 
also remove the need to materialize a column in each logical partition that is 
already encoded in the path name. Furthermore, this would provide an nice 
interop and migration strategy for Hive users who may one day use 
{{SQLContext}} directly.  (was: For {{SQLContext}} (not {{HiveContext}}) it 
would be very convenient to support a virtual column that maps to part of the 
the file path, similar to what is done in Hive for partitions. The API could 
allow the user to type the column using an appropriate {{DataType}} instance. 
This new field could be addressed in SQL statements much the same as is done in 
Hive. As a consequence, this would provide an nice interop and migration 
strategy for Hive users who may one day use {{SQLContext}} directly.)

 Add support for SQLContext partition columns
 --

 Key: SPARK-5302
 URL: https://issues.apache.org/jira/browse/SPARK-5302
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Bob Tiernay

 For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
 support a virtual column that maps to part of the the file path, similar to 
 what is done in Hive for partitions. The API could allow the user to type the 
 column using an appropriate {{DataType}} instance. This new field could be 
 addressed in SQL statements much the same as is done in Hive. As a 
 consequence, pruning of partitions could be possible when executing a query 
 and also remove the need to materialize a column in each logical partition 
 that is already encoded in the path name. Furthermore, this would provide an 
 nice interop and migration strategy for Hive users who may one day use 
 {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-5304) applySchema returns NullPointerException

2015-01-17 Thread Mauro Pirrone (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mauro Pirrone closed SPARK-5304.

Resolution: Duplicate

 applySchema returns NullPointerException
 

 Key: SPARK-5304
 URL: https://issues.apache.org/jira/browse/SPARK-5304
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Mauro Pirrone

 The following code snippet returns NullPointerException:
 val result = .
   
 val rows = result.take(10)
 val rowRdd = SparkManager.getContext().parallelize(rows, 1)
 val schemaRdd = SparkManager.getSQLContext().applySchema(rowRdd, 
 result.schema)
 java.lang.NullPointerException
   at 
 org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:147)
   at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
   at scala.util.hashing.MurmurHash3.listHash(MurmurHash3.scala:168)
   at scala.util.hashing.MurmurHash3$.seqHash(MurmurHash3.scala:216)
   at scala.collection.LinearSeqLike$class.hashCode(LinearSeqLike.scala:53)
   at scala.collection.immutable.List.hashCode(List.scala:84)
   at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
   at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:63)
   at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:210)
   at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:172)
   at 
 org.apache.spark.sql.execution.LogicalRDD.hashCode(ExistingRDD.scala:58)
   at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
   at 
 scala.collection.mutable.HashTable$HashUtils$class.elemHashCode(HashTable.scala:398)
   at scala.collection.mutable.HashMap.elemHashCode(HashMap.scala:39)
   at 
 scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:130)
   at scala.collection.mutable.HashMap.findEntry(HashMap.scala:39)
   at scala.collection.mutable.HashMap.get(HashMap.scala:69)
   at 
 scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:187)
   at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
   at 
 scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:329)
   at 
 scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327)
   at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105)
   at 
 org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:44)
   at 
 org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:40)
   at 
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
   at 
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
   at 
 scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
   at 
 scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
   at 
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
   at 
 org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
   at 
 org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
   at org.apache.spark.sql.SchemaRDD.schema$lzycompute(SchemaRDD.scala:135)
   at org.apache.spark.sql.SchemaRDD.schema(SchemaRDD.scala:135)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5302) Add support for SQLContext partition columns


 [ 
https://issues.apache.org/jira/browse/SPARK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Tiernay updated SPARK-5302:
---
Description: For {{SQLContext}} (not {{HiveContext}}) it would be very 
convenient to support a virtual column that maps to part of the the file path, 
similar to what is done in Hive for partitions (e.g. 
{{/data/clicks/dt=2015-01-01/}}). The API could allow the user to type the 
column using an appropriate {{DataType}} instance. This new field could be 
addressed in SQL statements much the same as is done in Hive. As a consequence, 
pruning of partitions could be possible when executing a query and also remove 
the need to materialize a column in each logical partition that is already 
encoded in the path name. Furthermore, this would provide an nice interop and 
migration strategy for Hive users who may one day use {{SQLContext}} directly.  
(was: For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
support a virtual column that maps to part of the the file path, similar to 
what is done in Hive for partitions. The API could allow the user to type the 
column using an appropriate {{DataType}} instance. This new field could be 
addressed in SQL statements much the same as is done in Hive. As a consequence, 
pruning of partitions could be possible when executing a query and also remove 
the need to materialize a column in each logical partition that is already 
encoded in the path name. Furthermore, this would provide an nice interop and 
migration strategy for Hive users who may one day use {{SQLContext}} directly.)

 Add support for SQLContext partition columns
 --

 Key: SPARK-5302
 URL: https://issues.apache.org/jira/browse/SPARK-5302
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Bob Tiernay

 For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
 support a virtual column that maps to part of the the file path, similar to 
 what is done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}}). 
 The API could allow the user to type the column using an appropriate 
 {{DataType}} instance. This new field could be addressed in SQL statements 
 much the same as is done in Hive. As a consequence, pruning of partitions 
 could be possible when executing a query and also remove the need to 
 materialize a column in each logical partition that is already encoded in the 
 path name. Furthermore, this would provide an nice interop and migration 
 strategy for Hive users who may one day use {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5306) Support for a NotEqualsFilter in the filter PrunedFilteredScan pushdown


 [ 
https://issues.apache.org/jira/browse/SPARK-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corey J. Nolet updated SPARK-5306:
--
Component/s: SQL

 Support for a NotEqualsFilter in the filter PrunedFilteredScan pushdown
 ---

 Key: SPARK-5306
 URL: https://issues.apache.org/jira/browse/SPARK-5306
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Corey J. Nolet

 This would be a pretty significant addition to the Filters that get pushed 
 down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5306) Support for a NotEqualsFilter in the filter PrunedFilteredScan pushdown

Corey J. Nolet created SPARK-5306:
-

 Summary: Support for a NotEqualsFilter in the filter 
PrunedFilteredScan pushdown
 Key: SPARK-5306
 URL: https://issues.apache.org/jira/browse/SPARK-5306
 Project: Spark
  Issue Type: Improvement
Reporter: Corey J. Nolet


This would be a pretty significant addition to the Filters that get pushed down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5304) applySchema returns NullPointerException

2015-01-17 Thread Mauro Pirrone (JIRA)

Mauro Pirrone created SPARK-5304:


 Summary: applySchema returns NullPointerException
 Key: SPARK-5304
 URL: https://issues.apache.org/jira/browse/SPARK-5304
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Mauro Pirrone


The following code snippet returns NullPointerException:

val result = .
  
val rows = result.take(10)
val rowRdd = SparkManager.getContext().parallelize(rows, 1)
val schemaRdd = SparkManager.getSQLContext().applySchema(rowRdd, result.schema)

java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:147)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at scala.util.hashing.MurmurHash3.listHash(MurmurHash3.scala:168)
at scala.util.hashing.MurmurHash3$.seqHash(MurmurHash3.scala:216)
at scala.collection.LinearSeqLike$class.hashCode(LinearSeqLike.scala:53)
at scala.collection.immutable.List.hashCode(List.scala:84)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:63)
at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:210)
at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:172)
at 
org.apache.spark.sql.execution.LogicalRDD.hashCode(ExistingRDD.scala:58)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at 
scala.collection.mutable.HashTable$HashUtils$class.elemHashCode(HashTable.scala:398)
at scala.collection.mutable.HashMap.elemHashCode(HashMap.scala:39)
at 
scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:130)
at scala.collection.mutable.HashMap.findEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.get(HashMap.scala:69)
at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:187)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:329)
at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327)
at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:44)
at 
org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:40)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.schema$lzycompute(SchemaRDD.scala:135)
at org.apache.spark.sql.SchemaRDD.schema(SchemaRDD.scala:135)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5303) applySchema returns NullPointerException

2015-01-17 Thread Mauro Pirrone (JIRA)

Mauro Pirrone created SPARK-5303:


 Summary: applySchema returns NullPointerException
 Key: SPARK-5303
 URL: https://issues.apache.org/jira/browse/SPARK-5303
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Mauro Pirrone


The following code snippet returns NullPointerException:

val result = .
  
val rows = result.take(10)
val rowRdd = SparkManager.getContext().parallelize(rows, 1)
val schemaRdd = SparkManager.getSQLContext().applySchema(rowRdd, result.schema)

java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.AttributeReference.hashCode(namedExpressions.scala:147)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at scala.util.hashing.MurmurHash3.listHash(MurmurHash3.scala:168)
at scala.util.hashing.MurmurHash3$.seqHash(MurmurHash3.scala:216)
at scala.collection.LinearSeqLike$class.hashCode(LinearSeqLike.scala:53)
at scala.collection.immutable.List.hashCode(List.scala:84)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at scala.util.hashing.MurmurHash3.productHash(MurmurHash3.scala:63)
at scala.util.hashing.MurmurHash3$.productHash(MurmurHash3.scala:210)
at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:172)
at 
org.apache.spark.sql.execution.LogicalRDD.hashCode(ExistingRDD.scala:58)
at scala.runtime.ScalaRunTime$.hash(ScalaRunTime.scala:210)
at 
scala.collection.mutable.HashTable$HashUtils$class.elemHashCode(HashTable.scala:398)
at scala.collection.mutable.HashMap.elemHashCode(HashMap.scala:39)
at 
scala.collection.mutable.HashTable$class.findEntry(HashTable.scala:130)
at scala.collection.mutable.HashMap.findEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.get(HashMap.scala:69)
at 
scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:187)
at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:329)
at 
scala.collection.TraversableLike$$anonfun$groupBy$1.apply(TraversableLike.scala:327)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
scala.collection.TraversableLike$class.groupBy(TraversableLike.scala:327)
at scala.collection.AbstractTraversable.groupBy(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:44)
at 
org.apache.spark.sql.catalyst.analysis.NewRelationInstances$.apply(MultiInstanceRelation.scala:40)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at 
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at 
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at 
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at 
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at org.apache.spark.sql.SchemaRDD.schema$lzycompute(SchemaRDD.scala:135)
at org.apache.spark.sql.SchemaRDD.schema(SchemaRDD.scala:135)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5307) Add utility to help with NotSerializableException debugging

2015-01-17 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-5307:
--

 Summary: Add utility to help with NotSerializableException 
debugging
 Key: SPARK-5307
 URL: https://issues.apache.org/jira/browse/SPARK-5307
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin


Scala closures can easily capture objects unintentionally, especially with 
implicit arguments. I think we can do more than just relying on the users being 
smart about using sun.io.serialization.extendedDebugInfo to find more debug 
information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5302) Add support for SQLContext partition columns

Bob Tiernay created SPARK-5302:
--

 Summary: Add support for SQLContext partition columns
 Key: SPARK-5302
 URL: https://issues.apache.org/jira/browse/SPARK-5302
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Bob Tiernay


For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support 
a virtual column that maps to part of the the file path, similar to what is 
done in Hive for partitions. The API could allow the user to type the column 
using an appropriate {{DataType}} instance. This new field could be addressed 
in SQL statements much the same as is done in Hive. As a consequence, this 
would provide an nice interop and migration strategy for Hive users who may one 
day use {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3694) Allow printing object graph of tasks/RDD's with a debug flag


 [ 
https://issues.apache.org/jira/browse/SPARK-3694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3694.

Resolution: Duplicate

 Allow printing object graph of tasks/RDD's with a debug flag
 

 Key: SPARK-3694
 URL: https://issues.apache.org/jira/browse/SPARK-3694
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Patrick Wendell
Assignee: Ilya Ganelin
  Labels: starter

 This would be useful for debugging extra references inside of RDD's
 Here is an example for inspiration:
 http://ehcache.org/xref/net/sf/ehcache/pool/sizeof/ObjectGraphWalker.html
 We'd want to print this trace for both the RDD serialization inside of the 
 DAGScheduler and the task serialization in the TaskSetManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5302) Add support for SQLContext partition columns


 [ 
https://issues.apache.org/jira/browse/SPARK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Tiernay updated SPARK-5302:
---
Description: 
For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support 
a virtual column that maps to part of the the file path, similar to what is 
done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}} where {{dt}} 
is a field of type {{TEXT}}). 

The API could allow the user to type the column using an appropriate 
{{DataType}} instance. This new field could be addressed in SQL statements much 
the same as is done in Hive. 

As a consequence, pruning of partitions could be possible when executing a 
query and also remove the need to materialize a column in each logical 
partition that is already encoded in the path name. Furthermore, this would 
provide an nice interop and migration strategy for Hive users who may one day 
use {{SQLContext}} directly.

  was:For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
support a virtual column that maps to part of the the file path, similar to 
what is done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}}). The 
API could allow the user to type the column using an appropriate {{DataType}} 
instance. This new field could be addressed in SQL statements much the same as 
is done in Hive. As a consequence, pruning of partitions could be possible when 
executing a query and also remove the need to materialize a column in each 
logical partition that is already encoded in the path name. Furthermore, this 
would provide an nice interop and migration strategy for Hive users who may one 
day use {{SQLContext}} directly.


 Add support for SQLContext partition columns
 --

 Key: SPARK-5302
 URL: https://issues.apache.org/jira/browse/SPARK-5302
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Bob Tiernay

 For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
 support a virtual column that maps to part of the the file path, similar to 
 what is done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}} 
 where {{dt}} is a field of type {{TEXT}}). 
 The API could allow the user to type the column using an appropriate 
 {{DataType}} instance. This new field could be addressed in SQL statements 
 much the same as is done in Hive. 
 As a consequence, pruning of partitions could be possible when executing a 
 query and also remove the need to materialize a column in each logical 
 partition that is already encoded in the path name. Furthermore, this would 
 provide an nice interop and migration strategy for Hive users who may one day 
 use {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5302) Add support for SQLContext partition columns


 [ 
https://issues.apache.org/jira/browse/SPARK-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bob Tiernay updated SPARK-5302:
---
Description: 
For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support 
a virtual column that maps to part of the the file path, similar to what is 
done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}} where {{dt}} 
is a column of type {{TEXT}}). 

The API could allow the user to type the column using an appropriate 
{{DataType}} instance. This new field could be addressed in SQL statements much 
the same as is done in Hive. 

As a consequence, pruning of partitions could be possible when executing a 
query and also remove the need to materialize a column in each logical 
partition that is already encoded in the path name. Furthermore, this would 
provide an nice interop and migration strategy for Hive users who may one day 
use {{SQLContext}} directly.

  was:
For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to support 
a virtual column that maps to part of the the file path, similar to what is 
done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}} where {{dt}} 
is a field of type {{TEXT}}). 

The API could allow the user to type the column using an appropriate 
{{DataType}} instance. This new field could be addressed in SQL statements much 
the same as is done in Hive. 

As a consequence, pruning of partitions could be possible when executing a 
query and also remove the need to materialize a column in each logical 
partition that is already encoded in the path name. Furthermore, this would 
provide an nice interop and migration strategy for Hive users who may one day 
use {{SQLContext}} directly.


 Add support for SQLContext partition columns
 --

 Key: SPARK-5302
 URL: https://issues.apache.org/jira/browse/SPARK-5302
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Bob Tiernay

 For {{SQLContext}} (not {{HiveContext}}) it would be very convenient to 
 support a virtual column that maps to part of the the file path, similar to 
 what is done in Hive for partitions (e.g. {{/data/clicks/dt=2015-01-01/}} 
 where {{dt}} is a column of type {{TEXT}}). 
 The API could allow the user to type the column using an appropriate 
 {{DataType}} instance. This new field could be addressed in SQL statements 
 much the same as is done in Hive. 
 As a consequence, pruning of partitions could be possible when executing a 
 query and also remove the need to materialize a column in each logical 
 partition that is already encoded in the path name. Furthermore, this would 
 provide an nice interop and migration strategy for Hive users who may one day 
 use {{SQLContext}} directly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5096) SparkBuild.scala assumes you are at the spark root dir


 [ 
https://issues.apache.org/jira/browse/SPARK-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5096.

   Resolution: Fixed
Fix Version/s: 1.3.0

 SparkBuild.scala assumes you are at the spark root dir
 --

 Key: SPARK-5096
 URL: https://issues.apache.org/jira/browse/SPARK-5096
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Michael Armbrust
Assignee: Michael Armbrust
 Fix For: 1.3.0


 This is bad because it breaks compiling spark as an external project ref and 
 is generally bad SBT practice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5096) SparkBuild.scala assumes you are at the spark root dir


 [ 
https://issues.apache.org/jira/browse/SPARK-5096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5096:
---
Target Version/s:   (was: 1.0.3)

 SparkBuild.scala assumes you are at the spark root dir
 --

 Key: SPARK-5096
 URL: https://issues.apache.org/jira/browse/SPARK-5096
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Michael Armbrust
Assignee: Michael Armbrust
 Fix For: 1.3.0


 This is bad because it breaks compiling spark as an external project ref and 
 is generally bad SBT practice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5279) Use java.math.BigDecimal as the exposed Decimal type


[ 
https://issues.apache.org/jira/browse/SPARK-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281645#comment-14281645
 ] 

Apache Spark commented on SPARK-5279:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/4092

 Use java.math.BigDecimal as the exposed Decimal type
 

 Key: SPARK-5279
 URL: https://issues.apache.org/jira/browse/SPARK-5279
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin

 Change it from scala.BigDecimal to java.math.BigDecimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-5289) Backport publishing of repl, yarn into branch-1.2


 [ 
https://issues.apache.org/jira/browse/SPARK-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5289.

Resolution: Fixed

 Backport publishing of repl, yarn into branch-1.2
 -

 Key: SPARK-5289
 URL: https://issues.apache.org/jira/browse/SPARK-5289
 Project: Spark
  Issue Type: Improvement
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker

 In SPARK-3452 we did some clean-up of published artifacts that turned out to 
 adversely affect some users. This has been mostly patched up in master via 
 SPARK-4925 (hive-thritserver) which was backported. For the repl and yarn 
 modules, they were fixed in SPARK-4048 as part of a larger change that only 
 went into master.
 Those pieces should be backported to Spark 1.2 to allow publishing in a 1.2.1 
 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5296) Predicate Pushdown (BaseRelation) to have an interface that will accept OR filters

[
https://issues.apache.org/jira/browse/SPARK-5296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281621#comment-14281621
]

Corey J. Nolet commented on SPARK-5296:
---

The more I'm thinking about this- It would be nice if there was a tree pushed
down for the filters instead of an Array. This is a significant change to the
API so it would still probably be easiest to create a new class
(PrunedFilteredTreeScan?).

Probably easiest to have AndFilter and OrFilter parent nodes that can be
arbitrarily nested with the leaf nodes being the filters that are already used
(hopefully with the addition of the NotEqualsFilter from SPARK-5306).

Predicate Pushdown (BaseRelation) to have an interface that will accept OR
filters
--

Key: SPARK-5296
URL: https://issues.apache.org/jira/browse/SPARK-5296
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Corey J. Nolet

Currently, the BaseRelation API allows a FilteredRelation to handle an
Array[Filter] which represents filter expressions that are applied as an AND
operator.
We should support OR operations in a BaseRelation as well. I'm not sure what
this would look like in terms of API changes, but it almost seems like a
FilteredUnionedScan BaseRelation (the name stinks but you get the idea) would
be useful.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4920) current spark version in UI is not striking

2015-01-17 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4920:
--
Target Version/s: 1.0.3  (was: 1.0.3, 1.2.1)

 current spark version in UI is not striking
 ---

 Key: SPARK-4920
 URL: https://issues.apache.org/jira/browse/SPARK-4920
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: uncleGen
Assignee: uncleGen
Priority: Minor
  Labels: backport-needed
 Fix For: 1.1.1, 1.2.1


 It is not convenient to see the Spark version. We can keep the same style 
 with Spark website.
 !https://raw.githubusercontent.com/uncleGen/Tech-Notes/master/spark_version.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4920) current spark version in UI is not striking

2015-01-17 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-4920:
--
Target Version/s: 1.0.3, 1.2.1  (was: 1.1.1, 1.0.3, 1.2.1)
   Fix Version/s: 1.1.1

 current spark version in UI is not striking
 ---

 Key: SPARK-4920
 URL: https://issues.apache.org/jira/browse/SPARK-4920
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: uncleGen
Assignee: uncleGen
Priority: Minor
  Labels: backport-needed
 Fix For: 1.1.1, 1.2.1


 It is not convenient to see the Spark version. We can keep the same style 
 with Spark website.
 !https://raw.githubusercontent.com/uncleGen/Tech-Notes/master/spark_version.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-17 Thread Jongyoul Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Issue Type: Bug  (was: Improvement)

 Change executorId more unique on mesos fine-grained mode
 

 Key: SPARK-5198
 URL: https://issues.apache.org/jira/browse/SPARK-5198
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Reporter: Jongyoul Lee
 Fix For: 1.3.0, 1.2.1

 Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png


 In fine-grained mode, SchedulerBackend set executor name as same as slave id 
 with any task id. It's not good to track aspecific job because of logging a 
 different in a same log file. This is a same value while launching job on 
 coarse-grained mode.
 !Screen Shot 2015-01-12 at 11.14.39 AM.png!
 !Screen Shot 2015-01-12 at 11.34.30 AM.png!
 !Screen Shot 2015-01-12 at 11.34.41 AM.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5221) FileInputDStream remember window in certain situations causes files to be ignored

2015-01-17 Thread Jem Tucker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jem Tucker updated SPARK-5221:
--
Priority: Major  (was: Minor)

 FileInputDStream remember window in certain situations causes files to be 
 ignored 
 

 Key: SPARK-5221
 URL: https://issues.apache.org/jira/browse/SPARK-5221
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.1.1, 1.2.0
Reporter: Jem Tucker

 When batch times are greater than 1 minute, if a file begins to be moved into 
 a directory just before FileInputDStream.findNewFiles() is called but does 
 not become visible untill after it has excecuted and therefore is not 
 included in that batch, the file is then ignored in the following batch as 
 its mod time is less than the modTimeIgnoreThreshold. This causes data to be 
 ignored in spark streaming that shouldnt be, especially when large files are 
 being moved into the directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1812) Support cross-building with Scala 2.11

2015-01-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281361#comment-14281361
 ] 

François Garillot commented on SPARK-1812:
--

Hem. Both issues are now closed. Pinging [~pwendell].

 Support cross-building with Scala 2.11
 --

 Key: SPARK-1812
 URL: https://issues.apache.org/jira/browse/SPARK-1812
 Project: Spark
  Issue Type: New Feature
  Components: Build, Spark Core
Reporter: Matei Zaharia
Assignee: Prashant Sharma
 Fix For: 1.2.0


 Since Scala 2.10/2.11 are source compatible, we should be able to cross build 
 for both versions. From what I understand there are basically three things we 
 need to figure out:
 1. Have a two versions of our dependency graph, one that uses 2.11 
 dependencies and the other that uses 2.10 dependencies.
 2. Figure out how to publish different poms for 2.10 and 2.11.
 I think (1) can be accomplished by having a scala 2.11 profile. (2) isn't 
 really well supported by Maven since published pom's aren't generated 
 dynamically. But we can probably script around it to make it work. I've done 
 some initial sanity checks with a simple build here:
 https://github.com/pwendell/scala-maven-crossbuild



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4937) Adding optimization to simplify the filter condition


[ 
https://issues.apache.org/jira/browse/SPARK-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281384#comment-14281384
 ] 

Apache Spark commented on SPARK-4937:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/4086

 Adding optimization to simplify the filter condition
 

 Key: SPARK-4937
 URL: https://issues.apache.org/jira/browse/SPARK-4937
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.1.0
Reporter: wangfei
Assignee: Cheng Lian
 Fix For: 1.3.0


 Adding optimization to simplify the filter condition:
 1  condition that can get the boolean result such as:
 a  3  a  5   False
 a  1 || a  0 True
 2 Simplify And, Or condition, such as the sql (one of hive-testbench
 ):
 select
 sum(l_extendedprice* (1 - l_discount)) as revenue
 from
 lineitem,
 part
 where
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#32'
 and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')
 and l_quantity = 7 and l_quantity = 7 + 10
 and p_size between 1 and 5
 and l_shipmode in ('AIR', 'AIR REG')
 and l_shipinstruct = 'DELIVER IN PERSON'
 )
 or
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#35'
 and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
 and l_quantity = 15 and l_quantity = 15 + 10
 and p_size between 1 and 10
 and l_shipmode in ('AIR', 'AIR REG')
 and l_shipinstruct = 'DELIVER IN PERSON'
 )
 or
 (
 p_partkey = l_partkey
 and p_brand = 'Brand#24'
 and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
 and l_quantity = 26 and l_quantity = 26 + 10
 and p_size between 1 and 15
 and l_shipmode in ('AIR', 'AIR REG')
 and l_shipinstruct = 'DELIVER IN PERSON'
 );
  Before optimized it is a CartesianProduct, in my locally test this sql hang 
 and can not get result, after optimization the CartesianProduct replaced by 
 ShuffledHashJoin, which only need 20+ seconds to run this sql.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5297) File Streams do not work with custom key/values

2015-01-17 Thread Leonidas Fegaras (JIRA)

Leonidas Fegaras created SPARK-5297:
---

 Summary: File Streams do not work with custom key/values
 Key: SPARK-5297
 URL: https://issues.apache.org/jira/browse/SPARK-5297
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.2.0
Reporter: Leonidas Fegaras
Priority: Minor
 Fix For: 1.2.0


The following code:
{code}
stream_context.K,V,SequenceFileInputFormatK,VfileStream(directory)
.foreachRDD(new FunctionJavaPairRDDK,V,Void() {
 public Void call ( JavaPairRDDK,V rdd ) throws Exception {
 for ( Tuple2K,V x: rdd.collect() )
 System.out.println(# +x._1+ +x._2);
 return null;
 }
  });
stream_context.start();
stream_context.awaitTermination();
{code}
for custom (serializable) classes K and V compiles fine but gives an error
when I drop a new hadoop sequence file in the directory:
{quote}
15/01/17 09:13:59 ERROR scheduler.JobScheduler: Error generating jobs for time 
1421507639000 ms
java.lang.ClassCastException: java.lang.Object cannot be cast to 
org.apache.hadoop.mapreduce.InputFormat
at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
at 
org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:236)
at 
org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$filesToRDD(FileInputDStream.scala:234)
at 
org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:128)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:296)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
at scala.Option.orElse(Option.scala:257)
{quote}
The same classes K and V work fine for non-streaming Spark:
{code}
spark_context.newAPIHadoopFile(path,F.class,K.class,SequenceFileInputFormat.class,conf)
{code}
also streaming works fine for TextFileInputFormat.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4894) Add Bernoulli-variant of Naive Bayes


[ 
https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281441#comment-14281441
 ] 

Apache Spark commented on SPARK-4894:
-

User 'leahmcguire' has created a pull request for this issue:
https://github.com/apache/spark/pull/4087

 Add Bernoulli-variant of Naive Bayes
 

 Key: SPARK-4894
 URL: https://issues.apache.org/jira/browse/SPARK-4894
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.2.0
Reporter: RJ Nowling
Assignee: RJ Nowling

 MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli 
 version of Naive Bayes is more useful for situations where the features are 
 binary values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4894) Add Bernoulli-variant of Naive Bayes

2015-01-17 Thread Leah McGuire (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281444#comment-14281444
 ] 

Leah McGuire commented on SPARK-4894:
-

Hi [~rnowling],

I submitted a pull request to add just the Bernoulli NB to the existing code. 
It follows the outline you suggested above with the exemption of the fact that 
I used an enumeration for the model type rather than a simple string. If you 
would have time to review it I would appreciate the feedback!



 Add Bernoulli-variant of Naive Bayes
 

 Key: SPARK-4894
 URL: https://issues.apache.org/jira/browse/SPARK-4894
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 1.2.0
Reporter: RJ Nowling
Assignee: RJ Nowling

 MLlib only supports the multinomial-variant of Naive Bayes.  The Bernoulli 
 version of Naive Bayes is more useful for situations where the features are 
 binary values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5298) Spark not starting on EC2 using spark-ec2

Grzegorz Dubicki created SPARK-5298:
---

 Summary: Spark not starting on EC2 using spark-ec2
 Key: SPARK-5298
 URL: https://issues.apache.org/jira/browse/SPARK-5298
 Project: Spark
  Issue Type: Bug
Reporter: Grzegorz Dubicki


Spark doesn't start after creating it with:

{noformat}
./spark-ec2 -k * -i * -s 1 --region=eu-west-1 --instance-type=t2.micro 
--spark-version=1.2.0 launch test2
{noformat}

(Output: https://gist.github.com/grzegorz-dubicki/f15caf9ff6c96ec69fee)

..or after stopping the instances on EC2 via AWS Console and starting the 
cluster with:

{noformat}
./spark-ec2 -k * -i * --region=eu-west-1 start test2
{noformat}

(Output: https://gist.github.com/grzegorz-dubicki/8b87192b3aa4e0ed028c)

Please note these errors in launch output:

{noformat}
~/spark-ec2
Initializing spark
~ ~/spark-ec2
ERROR: Unknown Spark version
Initializing shark
~ ~/spark-ec2 ~/spark-ec2
ERROR: Unknown Shark version
{noformat}

..and then these in start output:

{noformat}
./spark-standalone/setup.sh: line 26: /root/spark/sbin/stop-all.sh: Nie ma 
takiego pliku ani katalogu
./spark-standalone/setup.sh: line 31: /root/spark/sbin/start-master.sh: Nie ma 
takiego pliku ani katalogu
./spark-standalone/setup.sh: line 37: /root/spark/sbin/start-slaves.sh: Nie ma 
takiego pliku ani katalogu
{noformat}
(the error message is No such file or directory, in Polish)

It seems to be related with 
http://mail-archives.us.apache.org/mod_mbox/spark-user/201412.mbox/%3cCAJ5A9B_U=mdcxyftdkbk+sljzbcdpcb0qqs83u0grozfgkc...@mail.gmail.com%3e
 - I also have almost empty Spark and Shark dirs on the master of test2 cluster:

{noformat}
root@ip-172-31-7-179 ~]$ ls spark
conf  work
root@ip-172-31-7-179 ~]$ ls shark/
conf
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5298) Spark not starting on EC2 using spark-ec2