[jira] [Created] (BEAM-2962) Instead of Images placeholders like [Sequential Graph Graphic] are present in documentation

2017-09-16 Thread Aseem Bansal (JIRA)
Aseem Bansal created BEAM-2962:
--

 Summary: Instead of Images placeholders like [Sequential Graph 
Graphic] are present in documentation
 Key: BEAM-2962
 URL: https://issues.apache.org/jira/browse/BEAM-2962
 Project: Beam
  Issue Type: Bug
  Components: website
Reporter: Aseem Bansal
Assignee: Reuven Lax


I was reading the documentation at 
https://beam.apache.org/documentation/programming-guide/ and saw this being 
present


{noformat}
The resulting workflow graph of the above pipeline looks like this:

[Sequential Graph Graphic]
{noformat}

Looking at the above it seems that the text [Sequential Graph Graphic] is a 
placeholder which was supposed to be replaced by an image but was not. 
Similarly on this page there are other places where text is present inside [ .. 
] and it seems that image was supposed to be there but is not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094607#comment-16094607
 ] 

Aseem Bansal edited comment on SPARK-21483 at 7/20/17 12:29 PM:


Some pseudo code to show what I am trying to achieve

{code:java}
class MyTransformer implemenets Serializable {

   public FeaturesAndLabel transform(RawData rawData) {
//Some logic which creates Features and Labels from raw data. Raw 
data is just a java bean
   //FeaturesAndLabel is a bean which contains a SparseVector as 
features, and double as label
   }
}
{code}

{code:java}

Dataset dataset = //read from somewhere and create Dataset of RawData 
bean
Dataset featuresAndLabels = dataset.transform(new 
MyTransformer()::transform)

//use features and labels for machine learning
{code}



was (Author: anshbansal):
Some pseudo code to show what I am trying to achieve

{code:java}
class MyTransformer implemenets Serializable {

   public FeaturesAndLabel transform(RawData rawData) {
//Some logic which creates Features and Labels from raw data
   //FeaturesAndLabel is a bean which contains a SparseVector as 
features, and double as label
   }
}
{code}

{code:java}

Dataset dataset = //read from somewhere and create Dataset of RawData 
bean
Dataset featuresAndLabels = dataset.transform(new 
MyTransformer()::transform)

//use features and labels for machine learning
{code}


> Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in 
> Encoders.bean(Vector.class)
> --
>
> Key: SPARK-21483
> URL: https://issues.apache.org/jira/browse/SPARK-21483
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant 
> as per spark.
> This makes it impossible to create a Vector via a dataset.tranform. It should 
> be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094607#comment-16094607
 ] 

Aseem Bansal commented on SPARK-21483:
--

Some pseudo code to show what I am trying to achieve

{code:java}
class MyTransformer implemenets Serializable {

   public FeaturesAndLabel transform(RawData rawData) {
//Some logic which creates Features and Labels from raw data
   //FeaturesAndLabel is a bean which contains a SparseVector as 
features, and double as label
   }
}
{code}

{code:java}

Dataset dataset = //read from somewhere and create Dataset of RawData 
bean
Dataset featuresAndLabels = dataset.transform(new 
MyTransformer()::transform)

//use features and labels for machine learning
{code}


> Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in 
> Encoders.bean(Vector.class)
> --
>
> Key: SPARK-21483
> URL: https://issues.apache.org/jira/browse/SPARK-21483
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant 
> as per spark.
> This makes it impossible to create a Vector via a dataset.tranform. It should 
> be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094314#comment-16094314
 ] 

Aseem Bansal edited comment on SPARK-21483 at 7/20/17 9:11 AM:
---

No it does not. Can you give a link to what you are referring to? And I am not 
using spark SQL. I am using Dataset's transformations only.


was (Author: anshbansal):
Now it does not. Can you give a link to what you are referring to? And I am not 
using spark SQL. I am using Dataset's transformations only.

> Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in 
> Encoders.bean(Vector.class)
> --
>
> Key: SPARK-21483
> URL: https://issues.apache.org/jira/browse/SPARK-21483
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant 
> as per spark.
> This makes it impossible to create a Vector via a dataset.tranform. It should 
> be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094315#comment-16094315
 ] 

Aseem Bansal commented on SPARK-21482:
--

There is a LabeledPoint in new ml api too 
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint

I am able to workaround via using my own class. But I thought the ML package 
was supposed to be used with the dataset's API. That's why I am saying it 
should support this.



> Make LabeledPoint bean-compliant so it can be used in 
> Encoders.bean(LabeledPoint.class)
> ---
>
> Key: SPARK-21482
> URL: https://issues.apache.org/jira/browse/SPARK-21482
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The LabeledPoint class is currently not bean-compliant as per spark
> https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint
> This makes it impossible to create a LabeledPoint via a dataset.tranform. It 
> should be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094314#comment-16094314
 ] 

Aseem Bansal commented on SPARK-21483:
--

Now it does not. Can you give a link to what you are referring to? And I am not 
using spark SQL. I am using Dataset's transformations only.

> Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in 
> Encoders.bean(Vector.class)
> --
>
> Key: SPARK-21483
> URL: https://issues.apache.org/jira/browse/SPARK-21483
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant 
> as per spark.
> This makes it impossible to create a Vector via a dataset.tranform. It should 
> be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094297#comment-16094297
 ] 

Aseem Bansal commented on SPARK-21483:
--

How would you encode it otherwise?

> Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in 
> Encoders.bean(Vector.class)
> --
>
> Key: SPARK-21483
> URL: https://issues.apache.org/jira/browse/SPARK-21483
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant 
> as per spark.
> This makes it impossible to create a Vector via a dataset.tranform. It should 
> be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)

2017-07-20 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16094295#comment-16094295
 ] 

Aseem Bansal commented on SPARK-21482:
--

I am using Java API. I tried a simple transformation with 

{noformat}
dataset.transform(MyCustomToLabeledPointTransformer::transformer, 
Encoders.bean(LabeledPoint.class))
{noformat}

and it threw bean-compliance exception. I am not sure whether the encoders 
should act on beans or not but clearly something is going on due to which they 
are acting on beans.

> Make LabeledPoint bean-compliant so it can be used in 
> Encoders.bean(LabeledPoint.class)
> ---
>
> Key: SPARK-21482
> URL: https://issues.apache.org/jira/browse/SPARK-21482
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The LabeledPoint class is currently not bean-compliant as per spark
> https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint
> This makes it impossible to create a LabeledPoint via a dataset.tranform. It 
> should be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21483) Make org.apache.spark.ml.linalg.Vector bean-compliant so it can be used in Encoders.bean(Vector.class)

2017-07-20 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-21483:


 Summary: Make org.apache.spark.ml.linalg.Vector bean-compliant so 
it can be used in Encoders.bean(Vector.class)
 Key: SPARK-21483
 URL: https://issues.apache.org/jira/browse/SPARK-21483
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.1.0
Reporter: Aseem Bansal


The class org.apache.spark.ml.linalg.Vector is currently not bean-compliant as 
per spark.

This makes it impossible to create a Vector via a dataset.tranform. It should 
be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21482) Make LabeledPoint bean-compliant so it can be used in Encoders.bean(LabeledPoint.class)

2017-07-20 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-21482:


 Summary: Make LabeledPoint bean-compliant so it can be used in 
Encoders.bean(LabeledPoint.class)
 Key: SPARK-21482
 URL: https://issues.apache.org/jira/browse/SPARK-21482
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.1.0
Reporter: Aseem Bansal


The LabeledPoint class is currently not bean-compliant as per spark
https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.ml.feature.LabeledPoint

This makes it impossible to create a LabeledPoint via a dataset.tranform. It 
should be made bean-compliant so it can be used.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21481) Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF

2017-07-20 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-21481:


 Summary: Add indexOf method in ml.feature.HashingTF similar to 
mllib.feature.HashingTF
 Key: SPARK-21481
 URL: https://issues.apache.org/jira/browse/SPARK-21481
 Project: Spark
  Issue Type: Improvement
  Components: ML
Affects Versions: 2.2.0, 2.1.0
Reporter: Aseem Bansal


If we want to find the index of any input based on hashing trick then it is 
possible in 
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.mllib.feature.HashingTF
 but not in 
https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.ml.feature.HashingTF.

Should allow that for feature parity



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21473) Running Transform on a bean which has only setters gives NullPointerExcpetion

2017-07-19 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-21473:
-
Description: 
If I run the following using the Java API

{code:java}
dataset.map(Transformer::transform, 
Encoders.bean(BeanWithOnlySettersAndNoGetters.class));
{code}

Then I get the below exception. I understand that it is not bean-compliant 
without the getters but the exception is wrong. Perhaps fixing the exception 
message would be a solution?

{noformat}
Caused by: java.lang.NullPointerException
at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89)
at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
at org.apache.spark.sql.Encoders.bean(Encoders.scala)
{noformat}


  was:
If I run the following 

{code:java}
dataset.map(Transformer::transform, 
Encoders.bean(BeanWithOnlySettersAndNoGetters.class));
{code}

Then I get the below exception. I understand that it is not bean-compliant 
without the getters but the exception is wrong. Perhaps fixing the exception 
message would be a solution?

{noformat}
Caused by: java.lang.NullPointerException
at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89)
at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
at org.apache.spark.sql.Encoders.bean(Encoders.scala)
{noformat}



> Running Transform on a bean which has only setters gives NullPointerExcpetion
> -
>
> Key: SPARK-21473
> URL: https://issues.apache.org/jira/browse/SPARK-21473
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>
> If I run the following using the Java API
> {code:java}
> dataset.map(Transformer::transform, 
> Encoders.bean(BeanWithOnlySettersAndNoGetters.class));
> {code}
> Then I get the below exception. I understand that it is not bean-compliant 
> without the getters but the exception is wrong. Perhaps fixing the exception 
> message would be a solution?
> {noformat}
> Caused by: java.lang.NullPointerException
>   at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
>   at 
> org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> 

[jira] [Created] (SPARK-21473) Running Transform on a bean which has only setters gives NullPointerExcpetion

2017-07-19 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-21473:


 Summary: Running Transform on a bean which has only setters gives 
NullPointerExcpetion
 Key: SPARK-21473
 URL: https://issues.apache.org/jira/browse/SPARK-21473
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Aseem Bansal


If I run the following 

{code:java}
dataset.map(Transformer::transform, 
Encoders.bean(BeanWithOnlySettersAndNoGetters.class));
{code}

Then I get the below exception. I understand that it is not bean-compliant 
without the getters but the exception is wrong. Perhaps fixing the exception 
message would be a solution?

{noformat}
Caused by: java.lang.NullPointerException
at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at 
org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
at 
org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89)
at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
at org.apache.spark.sql.Encoders.bean(Encoders.scala)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2017-04-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954714#comment-15954714
 ] 

Aseem Bansal commented on SPARK-17742:
--

[~daanvdn] We ended up using kafka messages to communicate to the web app that 
was using the launcher to launch the job  whether the job was complete or 
failed. Dumped Launcher's states as they are broken.

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10413) ML models should support prediction on single instances

2017-02-08 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15857888#comment-15857888
 ] 

Aseem Bansal commented on SPARK-10413:
--

Something to look at would be https://github.com/combust/mleap which provides 
this on top of spark

> ML models should support prediction on single instances
> ---
>
> Key: SPARK-10413
> URL: https://issues.apache.org/jira/browse/SPARK-10413
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Xiangrui Meng
>Priority: Critical
>
> Currently models in the pipeline API only implement transform(DataFrame). It 
> would be quite useful to support prediction on single instance.
> UPDATE: This issue is for making predictions with single models.  We can make 
> methods like {{def predict(features: Vector): Double}} public.
> * This issue is *not* for single-instance prediction for full Pipelines, 
> which would require making predictions on {{Row}}s.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-05 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853629#comment-15853629
 ] 

Aseem Bansal commented on SPARK-19449:
--

[~sowen] My results are actually deterministic. No matter how many times I run 
it the number of true positives, true negatives, false positives, false 
negatives are always exactly the same. The problem is that they are always 
inconsistent too by exactly the same amount in the 2 implementations.

> Inconsistent results between ml package RandomForestClassificationModel and 
> mllib package RandomForestModel
> ---
>
> Key: SPARK-19449
> URL: https://issues.apache.org/jira/browse/SPARK-19449
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>
> I worked on some code to convert ml package RandomForestClassificationModel 
> to mllib package RandomForestModel. It was needed because we need to make 
> predictions on the order of ms. I found that the results are inconsistent 
> although the underlying DecisionTreeModel are exactly the same. So the 
> behavior between the 2 implementations is inconsistent which should not be 
> the case.
> The below code can be used to reproduce the issue. Can run this as a simple 
> Java app as long as you have spark dependencies set up properly.
> {noformat}
> import org.apache.spark.ml.Transformer;
> import org.apache.spark.ml.classification.*;
> import org.apache.spark.ml.linalg.*;
> import org.apache.spark.ml.regression.RandomForestRegressionModel;
> import org.apache.spark.mllib.linalg.DenseVector;
> import org.apache.spark.mllib.linalg.Vector;
> import org.apache.spark.mllib.tree.configuration.Algo;
> import org.apache.spark.mllib.tree.model.DecisionTreeModel;
> import org.apache.spark.mllib.tree.model.RandomForestModel;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.RowFactory;
> import org.apache.spark.sql.SparkSession;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import scala.Enumeration;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Random;
> abstract class Predictor {
> abstract double predict(Vector vector);
> }
> public class MainConvertModels {
> public static final int seed = 42;
> public static void main(String[] args) {
> int numRows = 1000;
> int numFeatures = 3;
> int numClasses = 2;
> double trainFraction = 0.8;
> double testFraction = 0.2;
> SparkSession spark = SparkSession.builder()
> .appName("conversion app")
> .master("local")
> .getOrCreate();
> Dataset data = getDummyData(spark, numRows, numFeatures, 
> numClasses);
> Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
> testFraction}, seed);
> Dataset trainingData = splits[0];
> Dataset testData = splits[1];
> testData.cache();
> List labels = getLabels(testData);
> List features = getFeatures(testData);
> DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
> DecisionTreeClassificationModel model1 = 
> classifier1.fit(trainingData);
> final DecisionTreeModel convertedModel1 = 
> convertDecisionTreeModel(model1, Algo.Classification());
> RandomForestClassifier classifier = new RandomForestClassifier();
> RandomForestClassificationModel model2 = classifier.fit(trainingData);
> final RandomForestModel convertedModel2 = 
> convertRandomForestModel(model2);
> System.out.println(
> "** DecisionTreeClassifier\n" +
> "** Original **" + getInfo(model1, testData) + "\n" +
> "** New  **" + getInfo(new Predictor() {
> double predict(Vector vector) {return 
> convertedModel1.predict(vector);}
> }, labels, features) + "\n" +
> "\n" +
> "** RandomForestClassifier\n" +
> "** Original **" + getInfo(model2, testData) + "\n" +
> "** New  **" + getInfo(new Predictor() {double 
> predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
> features) + "\n" +
> "\n" +
> "");
> }
> static Dataset getDummyData(SparkSession spark, int numberRows, int 
> numberFeatures, int labelUpperBound) {
> StructType schema = new StructType(new StructField[]{
> new StructField("label", 

[jira] [Commented] (SPARK-19444) Tokenizer example does not compile without extra imports

2017-02-03 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851588#comment-15851588
 ] 

Aseem Bansal commented on SPARK-19444:
--

https://github.com/apache/spark/pull/16789

> Tokenizer example does not compile without extra imports
> 
>
> Key: SPARK-19444
> URL: https://issues.apache.org/jira/browse/SPARK-19444
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer 
> does not compile without the following static import
> import static org.apache.spark.sql.functions.*;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851576#comment-15851576
 ] 

Aseem Bansal commented on SPARK-19449:
--

Isn't the decision tree debug string print it as a series of IF-ELSE? I printed 
the debug string for the 2 random forest models and it was exactly the same. In 
other words the 2 implementations should be mathematically equivalent. 

The random processes for selecting data should not cause any issues as I 
ensured that the exact same data is going to both versions. It works for 
decision trees and random forest classifier is just majority vote of bunch of 
decision trees classifiers so I cannot see how that could be different.

> Inconsistent results between ml package RandomForestClassificationModel and 
> mllib package RandomForestModel
> ---
>
> Key: SPARK-19449
> URL: https://issues.apache.org/jira/browse/SPARK-19449
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>
> I worked on some code to convert ml package RandomForestClassificationModel 
> to mllib package RandomForestModel. It was needed because we need to make 
> predictions on the order of ms. I found that the results are inconsistent 
> although the underlying DecisionTreeModel are exactly the same. So the 
> behavior between the 2 implementations is inconsistent which should not be 
> the case.
> The below code can be used to reproduce the issue. Can run this as a simple 
> Java app as long as you have spark dependencies set up properly.
> {noformat}
> import org.apache.spark.ml.Transformer;
> import org.apache.spark.ml.classification.*;
> import org.apache.spark.ml.linalg.*;
> import org.apache.spark.ml.regression.RandomForestRegressionModel;
> import org.apache.spark.mllib.linalg.DenseVector;
> import org.apache.spark.mllib.linalg.Vector;
> import org.apache.spark.mllib.tree.configuration.Algo;
> import org.apache.spark.mllib.tree.model.DecisionTreeModel;
> import org.apache.spark.mllib.tree.model.RandomForestModel;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.RowFactory;
> import org.apache.spark.sql.SparkSession;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import scala.Enumeration;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Random;
> abstract class Predictor {
> abstract double predict(Vector vector);
> }
> public class MainConvertModels {
> public static final int seed = 42;
> public static void main(String[] args) {
> int numRows = 1000;
> int numFeatures = 3;
> int numClasses = 2;
> double trainFraction = 0.8;
> double testFraction = 0.2;
> SparkSession spark = SparkSession.builder()
> .appName("conversion app")
> .master("local")
> .getOrCreate();
> Dataset data = getDummyData(spark, numRows, numFeatures, 
> numClasses);
> Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
> testFraction}, seed);
> Dataset trainingData = splits[0];
> Dataset testData = splits[1];
> testData.cache();
> List labels = getLabels(testData);
> List features = getFeatures(testData);
> DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
> DecisionTreeClassificationModel model1 = 
> classifier1.fit(trainingData);
> final DecisionTreeModel convertedModel1 = 
> convertDecisionTreeModel(model1, Algo.Classification());
> RandomForestClassifier classifier = new RandomForestClassifier();
> RandomForestClassificationModel model2 = classifier.fit(trainingData);
> final RandomForestModel convertedModel2 = 
> convertRandomForestModel(model2);
> System.out.println(
> "** DecisionTreeClassifier\n" +
> "** Original **" + getInfo(model1, testData) + "\n" +
> "** New  **" + getInfo(new Predictor() {
> double predict(Vector vector) {return 
> convertedModel1.predict(vector);}
> }, labels, features) + "\n" +
> "\n" +
> "** RandomForestClassifier\n" +
> "** Original **" + getInfo(model2, testData) + "\n" +
> "** New  **" + getInfo(new Predictor() {double 
> predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
> features) + "\n" +
> "\n" +
> "");
> }
>   

[jira] [Commented] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851568#comment-15851568
 ] 

Aseem Bansal commented on SPARK-19449:
--

[~srowen]
I removed some extra code. The part where I did the conversion is at the end in 
convertRandomForestModel method.

Basically the above code does this
- Prepare 1000 rows of data with 3 features randomly. Prepare 1000 labels 
randomly. I am not working on creating the model but the conversion. So having 
random data is not an issue. It will just be a horrible model.
- Split the data in 80/20 ratio for training/test
- train ml version of decision tree model and random forest model using the 
training set. Let's call them DT1 and RF1
- convert these to mllib version of the models. Let's call them DT2 and RF2
- Use the test set to predict labels using DT1, DT2, RF1, RF2. 
- Compare predicted labels DT1 with DT2. Same results
- Compare predicted labels RF1 with RF2. Different results.

There should not be any random results here as I have used seeds for random 
number generators everywhere and then used the exact same data for doing 
predictions using all 4 models. 

> Inconsistent results between ml package RandomForestClassificationModel and 
> mllib package RandomForestModel
> ---
>
> Key: SPARK-19449
> URL: https://issues.apache.org/jira/browse/SPARK-19449
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>
> I worked on some code to convert ml package RandomForestClassificationModel 
> to mllib package RandomForestModel. It was needed because we need to make 
> predictions on the order of ms. I found that the results are inconsistent 
> although the underlying DecisionTreeModel are exactly the same. So the 
> behavior between the 2 implementations is inconsistent which should not be 
> the case.
> The below code can be used to reproduce the issue. Can run this as a simple 
> Java app as long as you have spark dependencies set up properly.
> {noformat}
> import org.apache.spark.ml.Transformer;
> import org.apache.spark.ml.classification.*;
> import org.apache.spark.ml.linalg.*;
> import org.apache.spark.ml.regression.RandomForestRegressionModel;
> import org.apache.spark.mllib.linalg.DenseVector;
> import org.apache.spark.mllib.linalg.Vector;
> import org.apache.spark.mllib.tree.configuration.Algo;
> import org.apache.spark.mllib.tree.model.DecisionTreeModel;
> import org.apache.spark.mllib.tree.model.RandomForestModel;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.RowFactory;
> import org.apache.spark.sql.SparkSession;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import scala.Enumeration;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.Random;
> abstract class Predictor {
> abstract double predict(Vector vector);
> }
> public class MainConvertModels {
> public static final int seed = 42;
> public static void main(String[] args) {
> int numRows = 1000;
> int numFeatures = 3;
> int numClasses = 2;
> double trainFraction = 0.8;
> double testFraction = 0.2;
> SparkSession spark = SparkSession.builder()
> .appName("conversion app")
> .master("local")
> .getOrCreate();
> Dataset data = getDummyData(spark, numRows, numFeatures, 
> numClasses);
> Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
> testFraction}, seed);
> Dataset trainingData = splits[0];
> Dataset testData = splits[1];
> testData.cache();
> List labels = getLabels(testData);
> List features = getFeatures(testData);
> DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
> DecisionTreeClassificationModel model1 = 
> classifier1.fit(trainingData);
> final DecisionTreeModel convertedModel1 = 
> convertDecisionTreeModel(model1, Algo.Classification());
> RandomForestClassifier classifier = new RandomForestClassifier();
> RandomForestClassificationModel model2 = classifier.fit(trainingData);
> final RandomForestModel convertedModel2 = 
> convertRandomForestModel(model2);
> System.out.println(
> "** DecisionTreeClassifier\n" +
> "** Original **" + getInfo(model1, testData) + "\n" +
> "** New  **" + getInfo(new Predictor() {
> double predict(Vector vector) {return 
> convertedModel1.predict(vector);}
>   

[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-19449:
-
Description: 
I worked on some code to convert ml package RandomForestClassificationModel to 
mllib package RandomForestModel. It was needed because we need to make 
predictions on the order of ms. I found that the results are inconsistent 
although the underlying DecisionTreeModel are exactly the same. So the behavior 
between the 2 implementations is inconsistent which should not be the case.

The below code can be used to reproduce the issue. Can run this as a simple 
Java app as long as you have spark dependencies set up properly.

{noformat}
import org.apache.spark.ml.Transformer;
import org.apache.spark.ml.classification.*;
import org.apache.spark.ml.linalg.*;
import org.apache.spark.ml.regression.RandomForestRegressionModel;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.tree.configuration.Algo;
import org.apache.spark.mllib.tree.model.DecisionTreeModel;
import org.apache.spark.mllib.tree.model.RandomForestModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.Metadata;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import scala.Enumeration;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

abstract class Predictor {
abstract double predict(Vector vector);
}

public class MainConvertModels {

public static final int seed = 42;

public static void main(String[] args) {

int numRows = 1000;
int numFeatures = 3;
int numClasses = 2;

double trainFraction = 0.8;
double testFraction = 0.2;


SparkSession spark = SparkSession.builder()
.appName("conversion app")
.master("local")
.getOrCreate();


Dataset data = getDummyData(spark, numRows, numFeatures, 
numClasses);

Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
testFraction}, seed);
Dataset trainingData = splits[0];
Dataset testData = splits[1];
testData.cache();

List labels = getLabels(testData);
List features = getFeatures(testData);

DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
DecisionTreeClassificationModel model1 = classifier1.fit(trainingData);
final DecisionTreeModel convertedModel1 = 
convertDecisionTreeModel(model1, Algo.Classification());


RandomForestClassifier classifier = new RandomForestClassifier();
RandomForestClassificationModel model2 = classifier.fit(trainingData);
final RandomForestModel convertedModel2 = 
convertRandomForestModel(model2);

System.out.println(

"** DecisionTreeClassifier\n" +
"** Original **" + getInfo(model1, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {
double predict(Vector vector) {return 
convertedModel1.predict(vector);}
}, labels, features) + "\n" +

"\n" +

"** RandomForestClassifier\n" +
"** Original **" + getInfo(model2, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
features) + "\n" +

"\n" +
"");
}

static Dataset getDummyData(SparkSession spark, int numberRows, int 
numberFeatures, int labelUpperBound) {

StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.DoubleType, false, 
Metadata.empty()),
new StructField("features", new VectorUDT(), false, 
Metadata.empty())
});

double[][] vectors = prepareData(numberRows, numberFeatures);

Random random = new Random(seed);
List dataTest = new ArrayList<>();
for (double[] vector : vectors) {
double label = (double) random.nextInt(2);
dataTest.add(RowFactory.create(label, Vectors.dense(vector)));
}

return spark.createDataFrame(dataTest, schema);
}

static double[][] prepareData(int numRows, int numFeatures) {

Random random = new Random(seed);

double[][] result = new double[numRows][numFeatures];

for (int row = 0; row < numRows; row++) {
for (int feature = 0; feature < numFeatures; feature++) {
result[row][feature] = random.nextDouble();
}
}

return result;
}

static 

[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-19449:
-
Description: 
I worked on some code to convert ml package RandomForestClassificationModel to 
mllib package RandomForestModel. It was needed because we need to make 
predictions on the order of ms. I found that the results are inconsistent 
although the underlying DecisionTreeModel are exactly the same. 

The below code can be used to reproduce the issue. Can run this as a simple 
Java app as long as you have spark dependencies set up properly.

{noformat}
import org.apache.spark.ml.Transformer;
import org.apache.spark.ml.classification.*;
import org.apache.spark.ml.linalg.*;
import org.apache.spark.ml.regression.RandomForestRegressionModel;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.tree.configuration.Algo;
import org.apache.spark.mllib.tree.model.DecisionTreeModel;
import org.apache.spark.mllib.tree.model.RandomForestModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.Metadata;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import scala.Enumeration;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

abstract class Predictor {
abstract double predict(Vector vector);
}

public class MainConvertModels {

public static final int seed = 42;

public static void main(String[] args) {

int numRows = 1000;
int numFeatures = 3;
int numClasses = 2;

double trainFraction = 0.8;
double testFraction = 0.2;


SparkSession spark = SparkSession.builder()
.appName("conversion app")
.master("local")
.getOrCreate();

//Dataset data = getData(spark, "libsvm", 
"/opt/spark2/data/mllib/sample_libsvm_data.txt");
Dataset data = getDummyData(spark, numRows, numFeatures, 
numClasses);

Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
testFraction}, seed);
Dataset trainingData = splits[0];
Dataset testData = splits[1];
testData.cache();

List labels = getLabels(testData);
List features = getFeatures(testData);

DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
DecisionTreeClassificationModel model1 = classifier1.fit(trainingData);
final DecisionTreeModel convertedModel1 = 
convertDecisionTreeModel(model1, Algo.Classification());


RandomForestClassifier classifier = new RandomForestClassifier();
RandomForestClassificationModel model2 = classifier.fit(trainingData);
final RandomForestModel convertedModel2 = 
convertRandomForestModel(model2);


LogisticRegression lr = new LogisticRegression();
LogisticRegressionModel model3 = lr.fit(trainingData);
final org.apache.spark.mllib.classification.LogisticRegressionModel 
convertedModel3 = convertLogisticRegressionModel(model3);


System.out.println(

"** DecisionTreeClassifier\n" +
"** Original **" + getInfo(model1, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {
double predict(Vector vector) {return 
convertedModel1.predict(vector);}
}, labels, features) + "\n" +

"\n" +

"** RandomForestClassifier\n" +
"** Original **" + getInfo(model2, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
features) + "\n" +

"\n" +

"** LogisticRegression\n" +
"** Original **" + getInfo(model3, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, 
features) + "\n" +

"");
}

static Dataset getData(SparkSession spark, String format, String 
location) {

return spark.read()
.format(format)
.load(location);
}

static Dataset getDummyData(SparkSession spark, int numberRows, int 
numberFeatures, int labelUpperBound) {

StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.DoubleType, false, 
Metadata.empty()),
new StructField("features", new VectorUDT(), false, 
Metadata.empty())
});

double[][] vectors = prepareData(numberRows, numberFeatures);

 

[jira] [Updated] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-19449:
-
Description: 
I worked on some code to convert ml package RandomForestClassificationModel to 
mllib package RandomForestModel. It was needed because we need to make 
predictions on the order of ms. I found that the results are inconsistent 
although the underlying DecisionTreeModel are exactly the same. So the behavior 
between the 2 implementations is inconsistent which should not be the case.

The below code can be used to reproduce the issue. Can run this as a simple 
Java app as long as you have spark dependencies set up properly.

{noformat}
import org.apache.spark.ml.Transformer;
import org.apache.spark.ml.classification.*;
import org.apache.spark.ml.linalg.*;
import org.apache.spark.ml.regression.RandomForestRegressionModel;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.tree.configuration.Algo;
import org.apache.spark.mllib.tree.model.DecisionTreeModel;
import org.apache.spark.mllib.tree.model.RandomForestModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.Metadata;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import scala.Enumeration;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

abstract class Predictor {
abstract double predict(Vector vector);
}

public class MainConvertModels {

public static final int seed = 42;

public static void main(String[] args) {

int numRows = 1000;
int numFeatures = 3;
int numClasses = 2;

double trainFraction = 0.8;
double testFraction = 0.2;


SparkSession spark = SparkSession.builder()
.appName("conversion app")
.master("local")
.getOrCreate();

//Dataset data = getData(spark, "libsvm", 
"/opt/spark2/data/mllib/sample_libsvm_data.txt");
Dataset data = getDummyData(spark, numRows, numFeatures, 
numClasses);

Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
testFraction}, seed);
Dataset trainingData = splits[0];
Dataset testData = splits[1];
testData.cache();

List labels = getLabels(testData);
List features = getFeatures(testData);

DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
DecisionTreeClassificationModel model1 = classifier1.fit(trainingData);
final DecisionTreeModel convertedModel1 = 
convertDecisionTreeModel(model1, Algo.Classification());


RandomForestClassifier classifier = new RandomForestClassifier();
RandomForestClassificationModel model2 = classifier.fit(trainingData);
final RandomForestModel convertedModel2 = 
convertRandomForestModel(model2);


LogisticRegression lr = new LogisticRegression();
LogisticRegressionModel model3 = lr.fit(trainingData);
final org.apache.spark.mllib.classification.LogisticRegressionModel 
convertedModel3 = convertLogisticRegressionModel(model3);


System.out.println(

"** DecisionTreeClassifier\n" +
"** Original **" + getInfo(model1, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {
double predict(Vector vector) {return 
convertedModel1.predict(vector);}
}, labels, features) + "\n" +

"\n" +

"** RandomForestClassifier\n" +
"** Original **" + getInfo(model2, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
features) + "\n" +

"\n" +

"** LogisticRegression\n" +
"** Original **" + getInfo(model3, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, 
features) + "\n" +

"");
}

static Dataset getData(SparkSession spark, String format, String 
location) {

return spark.read()
.format(format)
.load(location);
}

static Dataset getDummyData(SparkSession spark, int numberRows, int 
numberFeatures, int labelUpperBound) {

StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.DoubleType, false, 
Metadata.empty()),
new StructField("features", new VectorUDT(), false, 

[jira] [Created] (SPARK-19449) Inconsistent results between ml package RandomForestClassificationModel and mllib package RandomForestModel

2017-02-03 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-19449:


 Summary: Inconsistent results between ml package 
RandomForestClassificationModel and mllib package RandomForestModel
 Key: SPARK-19449
 URL: https://issues.apache.org/jira/browse/SPARK-19449
 Project: Spark
  Issue Type: Bug
  Components: ML, MLlib
Affects Versions: 2.1.0
Reporter: Aseem Bansal


I worked on some code to convert ml package RandomForestClassificationModel to 
mllib package RandomForestModel. It was needed because we need to make 
predictions on the order of ms. I found that the results are inconsistent 
although the underlying DecisionTreeModel are exactly the same. 

The below code can be used to reproduce the issue. 

{noformat}
import org.apache.spark.ml.Transformer;
import org.apache.spark.ml.classification.*;
import org.apache.spark.ml.linalg.*;
import org.apache.spark.ml.regression.RandomForestRegressionModel;
import org.apache.spark.mllib.linalg.DenseVector;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.tree.configuration.Algo;
import org.apache.spark.mllib.tree.model.DecisionTreeModel;
import org.apache.spark.mllib.tree.model.RandomForestModel;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.Metadata;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import scala.Enumeration;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

abstract class Predictor {
abstract double predict(Vector vector);
}

public class MainConvertModels {

public static final int seed = 42;

public static void main(String[] args) {

int numRows = 1000;
int numFeatures = 3;
int numClasses = 2;

double trainFraction = 0.8;
double testFraction = 0.2;


SparkSession spark = SparkSession.builder()
.appName("conversion app")
.master("local")
.getOrCreate();

//Dataset data = getData(spark, "libsvm", 
"/opt/spark2/data/mllib/sample_libsvm_data.txt");
Dataset data = getDummyData(spark, numRows, numFeatures, 
numClasses);

Dataset[] splits = data.randomSplit(new double[]{trainFraction, 
testFraction}, seed);
Dataset trainingData = splits[0];
Dataset testData = splits[1];
testData.cache();

List labels = getLabels(testData);
List features = getFeatures(testData);

DecisionTreeClassifier classifier1 = new DecisionTreeClassifier();
DecisionTreeClassificationModel model1 = classifier1.fit(trainingData);
final DecisionTreeModel convertedModel1 = 
convertDecisionTreeModel(model1, Algo.Classification());


RandomForestClassifier classifier = new RandomForestClassifier();
RandomForestClassificationModel model2 = classifier.fit(trainingData);
final RandomForestModel convertedModel2 = 
convertRandomForestModel(model2);


LogisticRegression lr = new LogisticRegression();
LogisticRegressionModel model3 = lr.fit(trainingData);
final org.apache.spark.mllib.classification.LogisticRegressionModel 
convertedModel3 = convertLogisticRegressionModel(model3);


System.out.println(

"** DecisionTreeClassifier\n" +
"** Original **" + getInfo(model1, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {
double predict(Vector vector) {return 
convertedModel1.predict(vector);}
}, labels, features) + "\n" +

"\n" +

"** RandomForestClassifier\n" +
"** Original **" + getInfo(model2, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) {return convertedModel2.predict(vector);}}, labels, 
features) + "\n" +

"\n" +

"** LogisticRegression\n" +
"** Original **" + getInfo(model3, testData) + "\n" +
"** New  **" + getInfo(new Predictor() {double 
predict(Vector vector) { return convertedModel3.predict(vector);}}, labels, 
features) + "\n" +

"");
}

static Dataset getData(SparkSession spark, String format, String 
location) {

return spark.read()
.format(format)
.load(location);
}

static Dataset getDummyData(SparkSession spark, int numberRows, int 
numberFeatures, int labelUpperBound) {

StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.DoubleType, false, 
Metadata.empty()),

[jira] [Created] (SPARK-19445) Please remove tylerchap...@yahoo-inc.com subscription from u...@spark.apache.org

2017-02-03 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-19445:


 Summary: Please remove tylerchap...@yahoo-inc.com subscription 
from u...@spark.apache.org
 Key: SPARK-19445
 URL: https://issues.apache.org/jira/browse/SPARK-19445
 Project: Spark
  Issue Type: IT Help
  Components: Project Infra
Affects Versions: 2.1.0
Reporter: Aseem Bansal


Whenever a mail is sent to u...@spark.apache.org I receive this email

{noformat}
This is an automatically generated message.

tylerchap...@yahoo-inc.com is no longer with Yahoo! Inc.

Your message will not be forwarded.

If you have a sales inquiry, please email yahoosa...@yahoo-inc.com and someone 
will follow up with you shortly.

If you require assistance with a legal matter, please send a message to 
legal-noti...@yahoo-inc.com

Thank you!
{noformat}

It is clear that this user is no longer available. Please remove this email 
address from mailing list so we don't get so much spam.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19444) Tokenizer example does not compile without extra imports

2017-02-03 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851382#comment-15851382
 ] 

Aseem Bansal commented on SPARK-19444:
--

[~srowen]
I can find the location at 
https://github.com/apache/spark/blob/master/docs/ml-features.md
which led me to 
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java#L40
 

but what is 
$example on:untyped_ops$

The imports are there. But seems this is broken. Then this is probably a 
parsing issue?

> Tokenizer example does not compile without extra imports
> 
>
> Key: SPARK-19444
> URL: https://issues.apache.org/jira/browse/SPARK-19444
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer 
> does not compile without the following static import
> import static org.apache.spark.sql.functions.*;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19444) Tokenizer example does not compile without extra imports

2017-02-03 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-19444:
-
Priority: Minor  (was: Major)

> Tokenizer example does not compile without extra imports
> 
>
> Key: SPARK-19444
> URL: https://issues.apache.org/jira/browse/SPARK-19444
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.1.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer 
> does not compile without the following static import
> import static org.apache.spark.sql.functions.*;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19444) Tokenizer example does not compile without extra imports

2017-02-03 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-19444:


 Summary: Tokenizer example does not compile without extra imports
 Key: SPARK-19444
 URL: https://issues.apache.org/jira/browse/SPARK-19444
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.1.0
Reporter: Aseem Bansal


The example at http://spark.apache.org/docs/2.1.0/ml-features.html#tokenizer 
does not compile without the following static import

import static org.apache.spark.sql.functions.*;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-19410) Links to API documentation are broken

2017-01-31 Thread Aseem Bansal (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Aseem Bansal created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19410 
 
 
 
  Links to API documentation are broken  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Documentation 
 
 
 

Affects Versions:
 

 2.1.0 
 
 
 

Assignee:
 

 Unassigned 
 
 
 

Components:
 

 Documentation 
 
 
 

Created:
 

 31/Jan/17 08:55 
 
 
 

Priority:
 
  Major 
 
 
 

Reporter:
 
 Aseem Bansal 
 
 
 
 
 
 
 
 
 
 
I was looking at https://spark.apache.org/docs/latest/ml-pipeline.html#example-estimator-transformer-and-param and saw that the links to API documentation are broken 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 

[jira] [Created] (ZOOKEEPER-2657) Using zookeeper without SASL causes error logging

2016-12-29 Thread Aseem Bansal (JIRA)
Aseem Bansal created ZOOKEEPER-2657:
---

 Summary: Using zookeeper without SASL causes error logging
 Key: ZOOKEEPER-2657
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2657
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.4.6
Reporter: Aseem Bansal


We are using Kafka which uses zookeeper. But we are not using SASL. So we keep 
on getting 

{noformat}
CRITICAL: Found 32 lines (limit=1/1): (1) 2016-12-16 07:02:14.780 [INFO ] [r] 
org.apache.zookeeper.ClientCnxn [] - Opening socket connection to server 
10.0.1.47/10.0.1.47:2181. Will not attempt to authenticate using SASL (unknown 
error)
{noformat}

Found http://stackoverflow.com/a/26532778/2235567

Looked and found this based on the above 
https://svn.apache.org/repos/asf/zookeeper/trunk/src/java/main/org/apache/zookeeper/client/ZooKeeperSaslClient.java

Searched for "Will not attempt to authenticate using SASL" and found the 
"unknown error".

Can the message be changed so that the word error is not there as it is not 
really an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (SPARK-10413) Model should support prediction on single instance

2016-12-08 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734495#comment-15734495
 ] 

Aseem Bansal edited comment on SPARK-10413 at 12/9/16 6:39 AM:
---

Hi

Is anyone working on this? And is there a JIRA ticket for having a predict 
method on PipelineModel?


was (Author: anshbansal):
Hi

Is anyone working on this?

> Model should support prediction on single instance
> --
>
> Key: SPARK-10413
> URL: https://issues.apache.org/jira/browse/SPARK-10413
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Xiangrui Meng
>Priority: Critical
>
> Currently models in the pipeline API only implement transform(DataFrame). It 
> would be quite useful to support prediction on single instance.
> UPDATE: This issue is for making predictions with single models.  We can make 
> methods like {{def predict(features: Vector): Double}} public.
> * This issue is *not* for single-instance prediction for full Pipelines, 
> which would require making predictions on {{Row}}s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10413) Model should support prediction on single instance

2016-12-08 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15734495#comment-15734495
 ] 

Aseem Bansal commented on SPARK-10413:
--

Hi

Is anyone working on this?

> Model should support prediction on single instance
> --
>
> Key: SPARK-10413
> URL: https://issues.apache.org/jira/browse/SPARK-10413
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Reporter: Xiangrui Meng
>Priority: Critical
>
> Currently models in the pipeline API only implement transform(DataFrame). It 
> would be quite useful to support prediction on single instance.
> UPDATE: This issue is for making predictions with single models.  We can make 
> methods like {{def predict(features: Vector): Double}} public.
> * This issue is *not* for single-instance prediction for full Pipelines, 
> which would require making predictions on {{Row}}s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18241) If Spark Launcher fails to startApplication then handle's state does not change

2016-11-03 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631898#comment-15631898
 ] 

Aseem Bansal commented on SPARK-18241:
--

Looking at the source code after mainClass = Utils.classForName(childMainClass) 
at 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L695
 I see that the exceptions are being printed instead of being thrown/sent to 
the listeners.

The API says that startApplication is preferred but the various failures need 
to be sent via the handlers otherwise the listener API is not useful. Another 
case where failures are not sent via the Launcher API 
https://issues.apache.org/jira/browse/SPARK-17742

> If Spark Launcher fails to startApplication then handle's state does not 
> change
> ---
>
> Key: SPARK-18241
> URL: https://issues.apache.org/jira/browse/SPARK-18241
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I am using Spark 2.0.0. I am using 
> https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html
>  to submit my job. 
> If there is a failure after launcher's startapplication has been called but 
> before the spark job has actually started (i.e. in starting the spark process 
> that submits the job itself) there is 
> * no exception in the main thread that is submitting the job 
> * no exception in the job as it has not started
> * no state change of the launcher
> * the exception is logged in the error stream on the default logger name that 
> spark produces using the Job's main class.
> Basically, it is not possible to catch an exception if it happens during that 
> time. The easiest way to reproduce it is to delete the JAR file or use an 
> invalid spark home while launching the job using sparkLauncher. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18241) If Spark Launcher fails to startApplication then handle's state does not change

2016-11-03 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-18241:


 Summary: If Spark Launcher fails to startApplication then handle's 
state does not change
 Key: SPARK-18241
 URL: https://issues.apache.org/jira/browse/SPARK-18241
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.0.0
Reporter: Aseem Bansal


I am using Spark 2.0.0. I am using 
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html
 to submit my job. 

If there is a failure after launcher's startapplication has been called but 
before the spark job has actually started (i.e. in starting the spark process 
that submits the job itself) there is 
* no exception in the main thread that is submitting the job 
* no exception in the job as it has not started
* no state change of the launcher
* the exception is logged in the error stream on the default logger name that 
spark produces using the Job's main class.

Basically, it is not possible to catch an exception if it happens during that 
time. The easiest way to reproduce it is to delete the JAR file or use an 
invalid spark home while launching the job using sparkLauncher. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17742) Spark Launcher does not get failed state in Listener

2016-09-30 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535315#comment-15535315
 ] 

Aseem Bansal edited comment on SPARK-17742 at 9/30/16 7:35 AM:
---

I dug into the launcher code to see if I can figure out how it is working and 
see if I could find the bug. But when I reached LauncherServer's 
ServerConnection's handle method and found that this is socket programming I 
found it harder to find where the messages are coming from. Still trying to 
figure out but maybe someone who knows spark code better will find it easier to 
find the bug.


was (Author: anshbansal):
I dug into the launcher code to see if I can figure out how it is working and 
see if I could find the bug. But when I reached LauncherServer's 
ServerConnection's handle method and found that this is socket programming I 
found it harder to find where the messages are coming from. Still trying to 
figure out maybe someone who knows spark code better will find it easier to 
find the bug.

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17742) Spark Launcher does not get failed state in Listener

2016-09-30 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15535315#comment-15535315
 ] 

Aseem Bansal commented on SPARK-17742:
--

I dug into the launcher code to see if I can figure out how it is working and 
see if I could find the bug. But when I reached LauncherServer's 
ServerConnection's handle method and found that this is socket programming I 
found it harder to find where the messages are coming from. Still trying to 
figure out maybe someone who knows spark code better will find it easier to 
find the bug.

> Spark Launcher does not get failed state in Listener 
> -
>
> Key: SPARK-17742
> URL: https://issues.apache.org/jira/browse/SPARK-17742
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I tried to launch an application using the below code. This is dummy code to 
> reproduce the problem. I tried exiting spark with status -1, throwing an 
> exception etc. but in no case did the listener give me failed status. But if 
> a spark job returns -1 or throws an exception from the main method it should 
> be considered as a failure. 
> {code}
> package com.example;
> import org.apache.spark.launcher.SparkAppHandle;
> import org.apache.spark.launcher.SparkLauncher;
> import java.io.IOException;
> public class Main2 {
> public static void main(String[] args) throws IOException, 
> InterruptedException {
> SparkLauncher launcher = new SparkLauncher()
> .setSparkHome("/opt/spark2")
> 
> .setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
> .setMainClass("com.example.Main")
> .setMaster("local[2]");
> launcher.startApplication(new MyListener());
> Thread.sleep(1000 * 60);
> }
> }
> class MyListener implements SparkAppHandle.Listener {
> @Override
> public void stateChanged(SparkAppHandle handle) {
> System.out.println("state changed " + handle.getState());
> }
> @Override
> public void infoChanged(SparkAppHandle handle) {
> System.out.println("info changed " + handle.getState());
> }
> }
> {code}
> The spark job is 
> {code}
> package com.example;
> import org.apache.spark.sql.SparkSession;
> import java.io.IOException;
> public class Main {
> public static void main(String[] args) throws IOException {
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("" + System.currentTimeMillis())
> .getOrCreate();
> try {
> for (int i = 0; i < 15; i++) {
> Thread.sleep(1000);
> System.out.println("sleeping 1");
> }
> } catch (InterruptedException e) {
> e.printStackTrace();
> }
> //sparkSession.stop();
> System.exit(-1);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17742) Spark Launcher does not get failed state in Listener

2016-09-30 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-17742:


 Summary: Spark Launcher does not get failed state in Listener 
 Key: SPARK-17742
 URL: https://issues.apache.org/jira/browse/SPARK-17742
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.0.0
Reporter: Aseem Bansal


I tried to launch an application using the below code. This is dummy code to 
reproduce the problem. I tried exiting spark with status -1, throwing an 
exception etc. but in no case did the listener give me failed status. But if a 
spark job returns -1 or throws an exception from the main method it should be 
considered as a failure. 

{code}
package com.example;

import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;

import java.io.IOException;

public class Main2 {

public static void main(String[] args) throws IOException, 
InterruptedException {
SparkLauncher launcher = new SparkLauncher()
.setSparkHome("/opt/spark2")

.setAppResource("/home/aseem/projects/testsparkjob/build/libs/testsparkjob-1.0-SNAPSHOT.jar")
.setMainClass("com.example.Main")
.setMaster("local[2]");

launcher.startApplication(new MyListener());

Thread.sleep(1000 * 60);
}

}

class MyListener implements SparkAppHandle.Listener {

@Override
public void stateChanged(SparkAppHandle handle) {

System.out.println("state changed " + handle.getState());
}

@Override
public void infoChanged(SparkAppHandle handle) {
System.out.println("info changed " + handle.getState());
}
}
{code}

The spark job is 
{code}
package com.example;

import org.apache.spark.sql.SparkSession;
import java.io.IOException;

public class Main {

public static void main(String[] args) throws IOException {
SparkSession sparkSession = SparkSession
.builder()
.appName("" + System.currentTimeMillis())
.getOrCreate();


try {
for (int i = 0; i < 15; i++) {
Thread.sleep(1000);
System.out.println("sleeping 1");
}
} catch (InterruptedException e) {
e.printStackTrace();
}
//sparkSession.stop();

System.exit(-1);
}

}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495960#comment-15495960
 ] 

Aseem Bansal commented on SPARK-17560:
--

Can you share where this option needs to be set? Maybe I can try and add a pull 
request unless it is easier for you to just add a PR yourself instead of 
explaining.

> SQLContext tables returns table names in lower case only
> 
>
> Key: SPARK-17560
> URL: https://issues.apache.org/jira/browse/SPARK-17560
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I registered a table using
> dataSet.createOrReplaceTempView("TestTable");
> Then I tried to get the list of tables using 
> sparkSession.sqlContext().tableNames()
> but the name that I got was testtable. It used to give table names in proper 
> case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495906#comment-15495906
 ] 

Aseem Bansal commented on SPARK-17560:
--

Looked through 
https://spark.apache.org/docs/2.0.0/sql-programming-guide.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/SparkSession.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/SparkConf.html

and none of them say anything about this parameter

> SQLContext tables returns table names in lower case only
> 
>
> Key: SPARK-17560
> URL: https://issues.apache.org/jira/browse/SPARK-17560
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I registered a table using
> dataSet.createOrReplaceTempView("TestTable");
> Then I tried to get the list of tables using 
> sparkSession.sqlContext().tableNames()
> but the name that I got was testtable. It used to give table names in proper 
> case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems

2016-09-16 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-17561:
-
Description: 
I visited this page
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html

and saw  that the docs have formatting problems

!screenshot-1.png!

  was:
I visited this page
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html

and saw  that the docs have formatting problems


> DataFrameWriter documentation formatting problems
> -
>
> Key: SPARK-17561
> URL: https://issues.apache.org/jira/browse/SPARK-17561
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
> Attachments: screenshot-1.png
>
>
> I visited this page
> https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html
> and saw  that the docs have formatting problems
> !screenshot-1.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems

2016-09-16 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-17561:
-
Description: 
I visited this page
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html

and saw  that the docs have formatting problems

!screenshot-1.png!

Tried with browser cache disabled. Same issue

  was:
I visited this page
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html

and saw  that the docs have formatting problems

!screenshot-1.png!


> DataFrameWriter documentation formatting problems
> -
>
> Key: SPARK-17561
> URL: https://issues.apache.org/jira/browse/SPARK-17561
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
> Attachments: screenshot-1.png
>
>
> I visited this page
> https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html
> and saw  that the docs have formatting problems
> !screenshot-1.png!
> Tried with browser cache disabled. Same issue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17561) DataFrameWriter documentation formatting problems

2016-09-16 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-17561:
-
Attachment: screenshot-1.png

> DataFrameWriter documentation formatting problems
> -
>
> Key: SPARK-17561
> URL: https://issues.apache.org/jira/browse/SPARK-17561
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
> Attachments: screenshot-1.png
>
>
> I visited this page
> https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html
> and saw  that the docs have formatting problems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17561) DataFrameWriter documentation formatting problems

2016-09-16 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-17561:


 Summary: DataFrameWriter documentation formatting problems
 Key: SPARK-17561
 URL: https://issues.apache.org/jira/browse/SPARK-17561
 Project: Spark
  Issue Type: Documentation
Reporter: Aseem Bansal
 Attachments: screenshot-1.png

I visited this page
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameWriter.html

and saw  that the docs have formatting problems



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495862#comment-15495862
 ] 

Aseem Bansal commented on SPARK-17560:
--

No I did not. Where?

> SQLContext tables returns table names in lower case only
> 
>
> Key: SPARK-17560
> URL: https://issues.apache.org/jira/browse/SPARK-17560
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I registered a table using
> dataSet.createOrReplaceTempView("TestTable");
> Then I tried to get the list of tables using 
> sparkSession.sqlContext().tableNames()
> but the name that I got was testtable. It used to give table names in proper 
> case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495862#comment-15495862
 ] 

Aseem Bansal edited comment on SPARK-17560 at 9/16/16 9:38 AM:
---

No I did not. Where? Had not set that in Spark 1.4 either


was (Author: anshbansal):
No I did not. Where?

> SQLContext tables returns table names in lower case only
> 
>
> Key: SPARK-17560
> URL: https://issues.apache.org/jira/browse/SPARK-17560
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I registered a table using
> dataSet.createOrReplaceTempView("TestTable");
> Then I tried to get the list of tables using 
> sparkSession.sqlContext().tableNames()
> but the name that I got was testtable. It used to give table names in proper 
> case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-17560:


 Summary: SQLContext tables returns table names in lower case only
 Key: SPARK-17560
 URL: https://issues.apache.org/jira/browse/SPARK-17560
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aseem Bansal


I registered a table using

dataSet.createOrReplaceTempView("TestTable");

Then I tried to get the list of tables using 

sparkSession.sqlContext().tableNames()

but the name that I got was testtable. It used to give table names in proper 
case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model

2016-09-06 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1540#comment-1540
 ] 

Aseem Bansal commented on SPARK-17307:
--

Not adding it there would be fine. But there needs to be something. Also for 
contributions I tried searching for the file but could not. In which branch are 
you working?

> Document what all access is needed on S3 bucket when trying to save a model
> ---
>
> Key: SPARK-17307
> URL: https://issues.apache.org/jira/browse/SPARK-17307
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>Priority: Minor
>
> I faced this lack of documentation when I was trying to save a model to S3. 
> Initially I thought it should be only write. Then I found it also needs 
> delete to delete temporary files. Now I requested access for delete and tried 
> again and I am get the error
> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: 
> org.jets3t.service.S3ServiceException: S3 PUT failed for 
> '/dev-qa_%24folder%24' XML Error Message
> To reproduce this error the below can be used
> {code}
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("my app")
> .master("local") 
> .getOrCreate();
> JavaSparkContext jsc = new 
> JavaSparkContext(sparkSession.sparkContext());
> jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", );
> jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",  ACCESS KEY>);
> //Create a Pipelinemode
> 
> pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest");
> {code}
> This back and forth could be avoided if it was clearly mentioned what all 
> access spark needs to write to S3. Also would be great if why all of the 
> access is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model

2016-09-01 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454791#comment-15454791
 ] 

Aseem Bansal commented on SPARK-17307:
--

I would add that bit of information at 
http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/util/MLWritable.html#save(java.lang.String)

Something like it needs complete read write access when using with S3 should be 
enough.

> Document what all access is needed on S3 bucket when trying to save a model
> ---
>
> Key: SPARK-17307
> URL: https://issues.apache.org/jira/browse/SPARK-17307
> Project: Spark
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>Priority: Minor
>
> I faced this lack of documentation when I was trying to save a model to S3. 
> Initially I thought it should be only write. Then I found it also needs 
> delete to delete temporary files. Now I requested access for delete and tried 
> again and I am get the error
> Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: 
> org.jets3t.service.S3ServiceException: S3 PUT failed for 
> '/dev-qa_%24folder%24' XML Error Message
> To reproduce this error the below can be used
> {code}
> SparkSession sparkSession = SparkSession
> .builder()
> .appName("my app")
> .master("local") 
> .getOrCreate();
> JavaSparkContext jsc = new 
> JavaSparkContext(sparkSession.sparkContext());
> jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", );
> jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey",  ACCESS KEY>);
> //Create a Pipelinemode
> 
> pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest");
> {code}
> This back and forth could be avoided if it was clearly mentioned what all 
> access spark needs to write to S3. Also would be great if why all of the 
> access is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17307) Document what all access is needed on S3 bucket when trying to save a model

2016-08-29 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-17307:


 Summary: Document what all access is needed on S3 bucket when 
trying to save a model
 Key: SPARK-17307
 URL: https://issues.apache.org/jira/browse/SPARK-17307
 Project: Spark
  Issue Type: Documentation
Reporter: Aseem Bansal


I faced this lack of documentation when I was trying to save a model to S3. 
Initially I thought it should be only write. Then I found it also needs delete 
to delete temporary files. Now I requested access for delete and tried again 
and I am get the error

Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: 
org.jets3t.service.S3ServiceException: S3 PUT failed for '/dev-qa_%24folder%24' 
XML Error Message

To reproduce this error the below can be used

{code}
SparkSession sparkSession = SparkSession
.builder()
.appName("my app")
.master("local") 
.getOrCreate();

JavaSparkContext jsc = new 
JavaSparkContext(sparkSession.sparkContext());

jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", );
jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", );

//Create a Pipelinemode


pipelineModel.write().overwrite().save("s3n:///dev-qa/modelTest");
{code}

This back and forth could be avoided if it was clearly mentioned what all 
access spark needs to write to S3. Also would be great if why all of the access 
is needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers

2016-08-10 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-17012:
-
Description: 
Currently the option that we have in DataFrameReader is nullValue which allows 
us one default. But say in our data frame we have string and integers and we 
want to specify the default for strings and integers differently that is 
currently not possible.

If it is done for different data types then it should be possible to allow to 
specify the schema to be nullable false when inferring schema (as a new option).

  was:Currently the option that we have in DataFrameReader is nullValue which 
allows us one default. But say in our data frame we have string and integers 
and we want to specify the default for strings and integers differently that is 
currently not possible.


> Reading data frames via CSV - Allow to specify default value for integers
> -
>
> Key: SPARK-17012
> URL: https://issues.apache.org/jira/browse/SPARK-17012
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> Currently the option that we have in DataFrameReader is nullValue which 
> allows us one default. But say in our data frame we have string and integers 
> and we want to specify the default for strings and integers differently that 
> is currently not possible.
> If it is done for different data types then it should be possible to allow to 
> specify the schema to be nullable false when inferring schema (as a new 
> option).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-17012) Reading data frames via CSV - Allow to specify default value for integers

2016-08-10 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-17012:


 Summary: Reading data frames via CSV - Allow to specify default 
value for integers
 Key: SPARK-17012
 URL: https://issues.apache.org/jira/browse/SPARK-17012
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Aseem Bansal


Currently the option that we have in DataFrameReader is nullValue which allows 
us one default. But say in our data frame we have string and integers and we 
want to specify the default for strings and integers differently that is 
currently not possible.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented

2016-08-05 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409260#comment-15409260
 ] 

Aseem Bansal commented on SPARK-16893:
--

Yes. I would expect it to work without the use of format function as spark's 
documentation does not tell me anything about the need to use the format when 
using the csv function. 

> Spark CSV Provider option is not documented
> ---
>
> Key: SPARK-16893
> URL: https://issues.apache.org/jira/browse/SPARK-16893
> Project: Spark
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> I was working with databricks spark csv library and came across an error. I 
> have logged the issue in their github but it would be good to document that 
> in Apache Spark's documentation also
> I faced it with CSV. Someone else faced that with JSON 
> http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file
> Complete Issue details here
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented

2016-08-05 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409183#comment-15409183
 ] 

Aseem Bansal commented on SPARK-16893:
--

Reading a CSV causes an exception. Code used and excpetion are below. Also 
present in the github issue that I have referenced here.

{code}
public static void main(String[] args) {
SparkSession spark = SparkSession
.builder()
.appName("my app")
.getOrCreate();

Dataset df = spark.read()
.format("com.databricks.spark.csv")
.option("header", "true")
.option("nullValue", "")
.csv("/home/aseem/data.csv")
;

df.show();
}
{code}

bq. Exception in thread "main" java.lang.RuntimeException: Multiple sources 
found for csv (org.apache.spark.sql.execution.datasources.csv.CSVFileFormat, 
com.databricks.spark.csv.DefaultSource15), please specify the fully qualified 
class name.

People need to use format("csv"). I think that is counter intuitive seeing that 
I am using the CSV method.

> Spark CSV Provider option is not documented
> ---
>
> Key: SPARK-16893
> URL: https://issues.apache.org/jira/browse/SPARK-16893
> Project: Spark
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>Priority: Minor
>
> I was working with databricks spark csv library and came across an error. I 
> have logged the issue in their github but it would be good to document that 
> in Apache Spark's documentation also
> I faced it with CSV. Someone else faced that with JSON 
> http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file
> Complete Issue details here
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16895) Reading empty string from csv has changed behaviour

2016-08-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408892#comment-15408892
 ] 

Aseem Bansal edited comment on SPARK-16895 at 8/5/16 5:19 AM:
--

I see that this is duplicate. Regarding it being a bug or not I heard someone 
say this related to frameworks. 

> If a feature is not documented it does not exist. If a change is not 
> documented then it is a bug.


was (Author: anshbansal):
I understand that it is duplicate. Regarding it being a bug or not I heard 
someone say this. 

> If a feature is not documented it does not exist. If a change is not 
> documented then it is a bug.

> Reading empty string from csv has changed behaviour
> ---
>
> Key: SPARK-16895
> URL: https://issues.apache.org/jira/browse/SPARK-16895
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I have a file called test.csv
> "a"
> ""
> When I read it in Spark 1.4 I get an empty string as value. When I read it in 
> 2.0 I get "null" as the String.
> The testing code is same as mentioned at
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16895) Reading empty string from csv has changed behaviour

2016-08-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408892#comment-15408892
 ] 

Aseem Bansal commented on SPARK-16895:
--

I understand that it is duplicate. Regarding it being a bug or not I heard 
someone say this. 

> If a feature is not documented it does not exist. If a change is not 
> documented then it is a bug.

> Reading empty string from csv has changed behaviour
> ---
>
> Key: SPARK-16895
> URL: https://issues.apache.org/jira/browse/SPARK-16895
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I have a file called test.csv
> "a"
> ""
> When I read it in Spark 1.4 I get an empty string as value. When I read it in 
> 2.0 I get "null" as the String.
> The testing code is same as mentioned at
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16896) Loading csv with duplicate column names

2016-08-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-16896:


 Summary: Loading csv with duplicate column names
 Key: SPARK-16896
 URL: https://issues.apache.org/jira/browse/SPARK-16896
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aseem Bansal


It would be great if the library allows us to load csv with duplicate column 
names. I understand that having duplicate columns in the data is odd but 
sometimes we get data that has duplicate columns. Getting upstream data like 
that can happen. We may choose to ignore them but currently there is no way to 
drop those as we are not able to load them at all. Currently as a 
pre-processing I loaded the data into R, changed the column names and then make 
a fixed version with which Spark Java API can work.

But if talk about other options, e.g. R has read.csv which automatically takes 
care of such situation by appending a number to the column name.

Also case sensitivity in column names can also cause problems. I mean if we 
have columns like

ColumnName, columnName

I may want to have them as separate. But the option to do this is not 
documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16896) Loading csv with duplicate column names

2016-08-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407604#comment-15407604
 ] 

Aseem Bansal commented on SPARK-16896:
--

[~hyukjin.kwon] cc

> Loading csv with duplicate column names
> ---
>
> Key: SPARK-16896
> URL: https://issues.apache.org/jira/browse/SPARK-16896
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> It would be great if the library allows us to load csv with duplicate column 
> names. I understand that having duplicate columns in the data is odd but 
> sometimes we get data that has duplicate columns. Getting upstream data like 
> that can happen. We may choose to ignore them but currently there is no way 
> to drop those as we are not able to load them at all. Currently as a 
> pre-processing I loaded the data into R, changed the column names and then 
> make a fixed version with which Spark Java API can work.
> But if talk about other options, e.g. R has read.csv which automatically 
> takes care of such situation by appending a number to the column name.
> Also case sensitivity in column names can also cause problems. I mean if we 
> have columns like
> ColumnName, columnName
> I may want to have them as separate. But the option to do this is not 
> documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16893) Spark CSV Provider option is not documented

2016-08-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407601#comment-15407601
 ] 

Aseem Bansal commented on SPARK-16893:
--

[~hyukjin.kwon] cc

> Spark CSV Provider option is not documented
> ---
>
> Key: SPARK-16893
> URL: https://issues.apache.org/jira/browse/SPARK-16893
> Project: Spark
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I was working with databricks spark csv library and came across an error. I 
> have logged the issue in their github but it would be good to document that 
> in Apache Spark's documentation also
> I faced it with CSV. Someone else faced that with JSON 
> http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file
> Complete Issue details here
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16895) Reading empty string from csv has changed behaviour

2016-08-04 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407602#comment-15407602
 ] 

Aseem Bansal commented on SPARK-16895:
--

[~hyukjin.kwon] cc

> Reading empty string from csv has changed behaviour
> ---
>
> Key: SPARK-16895
> URL: https://issues.apache.org/jira/browse/SPARK-16895
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I have a file called test.csv
> "a"
> ""
> When I read it in Spark 1.4 I get an empty string as value. When I read it in 
> 2.0 I get "null" as the String.
> The testing code is same as mentioned at
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16895) Reading empty string from csv has changed behaviour

2016-08-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-16895:


 Summary: Reading empty string from csv has changed behaviour
 Key: SPARK-16895
 URL: https://issues.apache.org/jira/browse/SPARK-16895
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Aseem Bansal


I have a file called test.csv

"a"
""

When I read it in Spark 1.4 I get an empty string as value. When I read it in 
2.0 I get "null" as the String.

The testing code is same as mentioned at
https://github.com/databricks/spark-csv/issues/367




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16893) Spark CSV Provider option is not documented

2016-08-04 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-16893:
-
Description: 
I was working with databricks spark csv library and came across an error. I 
have logged the issue in their github but it would be good to document that in 
Apache Spark's documentation also

I faced it with CSV. Someone else faced that with JSON 
http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file

Complete Issue details here
https://github.com/databricks/spark-csv/issues/367

  was:
I was working with databricks spark csv library and came across an error. I 
have logged the issue in their github but it would be good to document that in 
Apache Spark's documentation also

Details here
https://github.com/databricks/spark-csv/issues/367


> Spark CSV Provider option is not documented
> ---
>
> Key: SPARK-16893
> URL: https://issues.apache.org/jira/browse/SPARK-16893
> Project: Spark
>  Issue Type: Documentation
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I was working with databricks spark csv library and came across an error. I 
> have logged the issue in their github but it would be good to document that 
> in Apache Spark's documentation also
> I faced it with CSV. Someone else faced that with JSON 
> http://stackoverflow.com/questions/38761920/spark2-0-error-multiple-sources-found-for-json-when-read-json-file
> Complete Issue details here
> https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16893) Spark CSV Provider option is not documented

2016-08-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-16893:


 Summary: Spark CSV Provider option is not documented
 Key: SPARK-16893
 URL: https://issues.apache.org/jira/browse/SPARK-16893
 Project: Spark
  Issue Type: Documentation
Affects Versions: 2.0.0
Reporter: Aseem Bansal


I was working with databricks spark csv library and came across an error. I 
have logged the issue in their github but it would be good to document that in 
Apache Spark's documentation also

Details here
https://github.com/databricks/spark-csv/issues/367



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (GROOVY-7727) Cannot create unicode sequences using \Uxxxxxx

2016-01-29 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/GROOVY-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123730#comment-15123730
 ] 

Aseem Bansal commented on GROOVY-7727:
--

I tried your code in Java.
\Uxxx failed in java also https://ideone.com/W99GOs

And that line says "provides a code point input method which accepts strings". 
So there is a specific method which accepts this format.

So this is not a bug. If you find the method name please do share.

> Cannot create unicode sequences using \Uxx
> --
>
> Key: GROOVY-7727
> URL: https://issues.apache.org/jira/browse/GROOVY-7727
> Project: Groovy
>  Issue Type: Bug
>  Components: Compiler
>Affects Versions: 2.4.5
>Reporter: Andres Almiray
>
> According to 
> http://www.oracle.com/technetwork/articles/java/supplementary-142654.html 
> "For text input, the Java 2 SDK provides a code point input method which 
> accepts strings of the form "\Uxx", where the uppercase "U" indicates 
> that the escape sequence contains six hexadecimal digits, thus allowing for 
> supplementary characters. A lowercase "u" indicates the original form of the 
> escape sequences, "\u". You can find this input method and its 
> documentation in the directory demo/jfc/CodePointIM of the J2SDK."
> The following code fails with a syntax exception
> s = "\U01f5d0"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7625) Slashy string in groovy allows brackets but double quoted string does not. Why?

2015-10-10 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7625:


 Summary: Slashy string in groovy allows brackets but double quoted 
string does not. Why?
 Key: GROOVY-7625
 URL: https://issues.apache.org/jira/browse/GROOVY-7625
 Project: Groovy
  Issue Type: Documentation
Reporter: Aseem Bansal
Priority: Minor


This

println("$()")

gives me a compiler error "Either escape a dollar sign or bracket the value 
expression"

But this 

println(/$()/)

prints `$()` fine. No errors

Why is there a difference? The only documented difference is that slashy 
strings make working with backslashes easier. I understand that a variable name 
cannot start with a bracket so it should be possible to make that special case. 
Is that the case for the slashy strings?

Just came across this when doing something with regex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GROOVY-7603) Update groovy docs for Category

2015-09-25 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/GROOVY-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated GROOVY-7603:
-
Description: 
Category docs refer to @Mixin but they are deprecated 
http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html

I found that using traits is not possible. But it would be nice to be able to 
use them. 

Tried a workaround but didn't work

{noformat}
trait Util {
Number getTwice() { this * 2 }
Number max(Number otherNumber) { Math.max(this, otherNumber) }
}

@groovy.lang.Category(Number)
abstract class UtilCategory implements Util {
}
{noformat}


  was:
Category docs refer to @Mixin but they are deprecated 
http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html

I found that using traits is not possible. But it would be nice to be able to 
use them. 


> Update groovy docs for Category
> ---
>
> Key: GROOVY-7603
> URL: https://issues.apache.org/jira/browse/GROOVY-7603
> Project: Groovy
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>
> Category docs refer to @Mixin but they are deprecated 
> http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html
> I found that using traits is not possible. But it would be nice to be able to 
> use them. 
> Tried a workaround but didn't work
> {noformat}
> trait Util {
> Number getTwice() { this * 2 }
> Number max(Number otherNumber) { Math.max(this, otherNumber) }
> }
> @groovy.lang.Category(Number)
> abstract class UtilCategory implements Util {
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7603) Update groovy docs for Category

2015-09-25 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7603:


 Summary: Update groovy docs for Category
 Key: GROOVY-7603
 URL: https://issues.apache.org/jira/browse/GROOVY-7603
 Project: Groovy
  Issue Type: Documentation
Reporter: Aseem Bansal


Category docs refer to @Mixin but they are deprecated 
http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html

Can traits be used?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GROOVY-7604) traits docs diamond problem explanation

2015-09-25 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/GROOVY-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908937#comment-14908937
 ] 

Aseem Bansal commented on GROOVY-7604:
--

I read further and found the section "Default conflict resolution" which gives 
the correct explanation

> traits docs diamond problem explanation
> ---
>
> Key: GROOVY-7604
> URL: https://issues.apache.org/jira/browse/GROOVY-7604
> Project: Groovy
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>
> http://www.groovy-lang.org/objectorientation.html#_composition_of_behaviors
> has an example after referring diamond problem. Is it correct example for 
> diamond problem? Shouldn't the method names be the same?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GROOVY-7603) Update groovy docs for Category

2015-09-25 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/GROOVY-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated GROOVY-7603:
-
Description: 
Category docs refer to @Mixin but they are deprecated 
http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html

I found that using traits is not possible. But it would be nice to be able to 
use them. 

  was:
Category docs refer to @Mixin but they are deprecated 
http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html

Can traits be used?


> Update groovy docs for Category
> ---
>
> Key: GROOVY-7603
> URL: https://issues.apache.org/jira/browse/GROOVY-7603
> Project: Groovy
>  Issue Type: Documentation
>Reporter: Aseem Bansal
>
> Category docs refer to @Mixin but they are deprecated 
> http://docs.groovy-lang.org/latest/html/gapi/groovy/lang/Category.html
> I found that using traits is not possible. But it would be nice to be able to 
> use them. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7605) Improve docs for MetaClass getMethods vs getMetaMethods

2015-09-25 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7605:


 Summary: Improve docs for MetaClass getMethods vs getMetaMethods
 Key: GROOVY-7605
 URL: https://issues.apache.org/jira/browse/GROOVY-7605
 Project: Groovy
  Issue Type: Documentation
Reporter: Aseem Bansal


The current explanation at 
http://docs.groovy-lang.org/latest/html/api/groovy/lang/MetaClass.html is not 
clear. 

I know that there is an explanation at 
http://www.groovy-lang.org/mailing-lists.html#nabble-td388327 but it just shows 
that Graeme added a method. I am guessing he added getMetaMethods.

As far as I can tell by running them getMethods is giving non-meta methods 
while getMetaMethods is only giving the meta. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GROOVY-7592) Problem in switch statement docs

2015-09-21 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/GROOVY-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901833#comment-14901833
 ] 

Aseem Bansal commented on GROOVY-7592:
--

[~pascalschumacher]
Did you fix the docs? Because I remember that there is another place where that 
is written. Don't remember exactly where.

> Problem in switch statement docs
> 
>
> Key: GROOVY-7592
> URL: https://issues.apache.org/jira/browse/GROOVY-7592
> Project: Groovy
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.4.5
>Reporter: Aseem Bansal
>Assignee: Pascal Schumacher
>Priority: Minor
> Fix For: 2.4.6
>
>
> As per http://www.groovy-lang.org/semantics.html default must be the last 
> thing in switch case. 
> Based on that I sent a PR which has been accepted
> https://github.com/apache/incubator-groovy/pull/82
> But when I tried to use default somewhere else it worked fine
> {noformat}
> String str = "aseem"
> switch(str) {
> default:
> println "default"
> break
> case "aseem":
> println "Aseem"
> break
> } 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7578) Image present for metaprogramming is incorrect

2015-09-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7578:


 Summary: Image present for metaprogramming is incorrect
 Key: GROOVY-7578
 URL: https://issues.apache.org/jira/browse/GROOVY-7578
 Project: Groovy
  Issue Type: Documentation
Reporter: Aseem Bansal


I am reading groovy metaprogramming documentation and saw that the image 
present is wrong. I mean there is an image present but the flow is wrong.

It has a block "Method exists in MetaClass or Class" two times. One after 
GroovyInterceptable and other after the first "Method exists in MetaClass or 
Class" .

I tried a simple program and as per this I believe the first block should have 
class only and second block should have MetaClass only



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7579) Improve docs for invokeMethod

2015-09-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7579:


 Summary: Improve docs for invokeMethod
 Key: GROOVY-7579
 URL: https://issues.apache.org/jira/browse/GROOVY-7579
 Project: Groovy
  Issue Type: Documentation
Reporter: Aseem Bansal


I was reading meta programming documentation when I noticed that "this method 
is called when the method you called is not present on a Groovy object"

As per the diagram it is incorrect. It is invoked when methodMissing is not 
present. This statement is as per the diagram.

Also as per the answer at this is not an appropriate example  
http://stackoverflow.com/questions/19220370/what-is-the-difference-between-invokemethod-and-methodmissing

Saying this because the answer by blackdrag (who I understand is a core 
committer to groovy) says that methodMissing should be used instead. 

Also the same page mentions "overhead of invokeMethod". It would be nic e to 
have a better explanation in the section of invokeMethod itself.

I am not knowledgeable about this so cannot suggest what can be added. But it 
would be better to have the explanation in the official docs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7580) ExpandoMetaClass append method does not throw an exception as per docs

2015-09-04 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7580:


 Summary: ExpandoMetaClass append method does not throw an 
exception as per docs
 Key: GROOVY-7580
 URL: https://issues.apache.org/jira/browse/GROOVY-7580
 Project: Groovy
  Issue Type: Bug
Reporter: Aseem Bansal


I was reading the docs when I came across "Note that the left shift operator is 
used to append a new method. If the method already exists an exception will be 
thrown."

I decided to try it via the below program. There was no exception. I am using 
groovy 2.3.8

{noformat}
class A {
}
A.metaClass.hello = {
  "hello superclass"
}

class B extends A {
}
B.metaClass.hello << {
  "hello subclass"
}

B.metaClass.hello << {
  "hello subclass"
}

new B().hello()
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2425) Migrate website from SVN to Git

2015-08-19 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703327#comment-14703327
 ] 

Aseem Bansal edited comment on KAFKA-2425 at 8/19/15 4:47 PM:
--

Sorry [~omkreddy] but not getting the time currently. Been busy for past some 
days. I checked INFRA team ticket and it says that only asf-site branch is 
supported. I understand that it is definitely a bummer. Just thinking whether 
it would be possible to use travis or something else to auto cherry pick from 
trunk/master to this branch? Then the commits can be done to master and let the 
script do the cherry picks. Don't know how to do it but will look if it is 
possible.

Something like 
http://lea.verou.me/2011/10/easily-keep-gh-pages-in-sync-with-master/


was (Author: anshbansal):
Sorry [~omkreddy] but not getting the time currently. Been busy for past some 
days. I checked INFRA team ticket and it says that only asf-site branch is 
supported. I understand that it is definitely a bummer. Just thinking whether 
it would be possible to use travis or something else to auto cherry pick from 
trunk/master to this branch? Then the commits can be done to master and let the 
script do the cherry picks. Don't know how to do it but will look if it is 
possible.

 Migrate website from SVN to Git 
 

 Key: KAFKA-2425
 URL: https://issues.apache.org/jira/browse/KAFKA-2425
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Manikumar Reddy

 The preference is to share the same Git repo for the code and website as per 
 discussion in the mailing list:
 http://search-hadoop.com/m/uyzND1Dux842dm7vg2
 Useful reference:
 https://blogs.apache.org/infra/entry/git_based_websites_available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git

2015-08-19 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14703327#comment-14703327
 ] 

Aseem Bansal commented on KAFKA-2425:
-

Sorry [~omkreddy] but not getting the time currently. Been busy for past some 
days. I checked INFRA team ticket and it says that only asf-site branch is 
supported. I understand that it is definitely a bummer. Just thinking whether 
it would be possible to use travis or something else to auto cherry pick from 
trunk/master to this branch? Then the commits can be done to master and let the 
script do the cherry picks. Don't know how to do it but will look if it is 
possible.

 Migrate website from SVN to Git 
 

 Key: KAFKA-2425
 URL: https://issues.apache.org/jira/browse/KAFKA-2425
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Manikumar Reddy

 The preference is to share the same Git repo for the code and website as per 
 discussion in the mailing list:
 http://search-hadoop.com/m/uyzND1Dux842dm7vg2
 Useful reference:
 https://blogs.apache.org/infra/entry/git_based_websites_available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git

2015-08-12 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693316#comment-14693316
 ] 

Aseem Bansal commented on KAFKA-2425:
-

Yes I am interested. But how to do that. I mean I can take the checkout of 
kafka code from https://github.com/apache/kafka. Where can I get the SVN code? 
Also anything specific to take care of?

 Migrate website from SVN to Git 
 

 Key: KAFKA-2425
 URL: https://issues.apache.org/jira/browse/KAFKA-2425
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma

 The preference is to share the same Git repo for the code and website as per 
 discussion in the mailing list:
 http://search-hadoop.com/m/uyzND1Dux842dm7vg2
 Useful reference:
 https://blogs.apache.org/infra/entry/git_based_websites_available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2425) Migrate website from SVN to Git

2015-08-12 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693505#comment-14693505
 ] 

Aseem Bansal edited comment on KAFKA-2425 at 8/12/15 1:54 PM:
--

The Infra ticket has fields Git Notification Mailing List and Git Repository 
Import Path. I am not sure what they are.

Project: Infrastructure
Issue Type: SVN-GIT Migration


was (Author: anshbansal):
The Infra ticket has fields Git Notification Mailing List and Git Repository 
Import Path. I am not sure what they are.

 Migrate website from SVN to Git 
 

 Key: KAFKA-2425
 URL: https://issues.apache.org/jira/browse/KAFKA-2425
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma

 The preference is to share the same Git repo for the code and website as per 
 discussion in the mailing list:
 http://search-hadoop.com/m/uyzND1Dux842dm7vg2
 Useful reference:
 https://blogs.apache.org/infra/entry/git_based_websites_available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2425) Migrate website from SVN to Git

2015-08-12 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693505#comment-14693505
 ] 

Aseem Bansal commented on KAFKA-2425:
-

The Infra ticket has fields Git Notification Mailing List and Git Repository 
Import Path. I am not sure what they are.

 Migrate website from SVN to Git 
 

 Key: KAFKA-2425
 URL: https://issues.apache.org/jira/browse/KAFKA-2425
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma

 The preference is to share the same Git repo for the code and website as per 
 discussion in the mailing list:
 http://search-hadoop.com/m/uyzND1Dux842dm7vg2
 Useful reference:
 https://blogs.apache.org/infra/entry/git_based_websites_available



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GROOVY-7543) Suggestion for Download page

2015-08-07 Thread Aseem Bansal (JIRA)
Aseem Bansal created GROOVY-7543:


 Summary: Suggestion for Download page
 Key: GROOVY-7543
 URL: https://issues.apache.org/jira/browse/GROOVY-7543
 Project: Groovy
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.4.4
 Environment: Website
Reporter: Aseem Bansal
Priority: Trivial


On the groovy website download page http://www.groovy-lang.org/download.html 
there is a System requirements at the bottom.

It says JVM Required. Is it minimum/maximum/only ? If it is based on some 
automated build to test compatibility it would be good to link that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GROOVY-7544) Nearly Duplicate sections in documentation

2015-08-07 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/GROOVY-7544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated GROOVY-7544:
-
Description: 
I was reading the documentation when I noticed that these two sections 
* http://www.groovy-lang.org/download.html
* http://www.groovy-lang.org/install.html

are nearly duplicate. It seems that due to duplicatacy both of them have some 
information which the other one does not have. It would be better to merge them 
into single section.

I would suggest to just keep the download section as it is better looking. 
Merge the extra information from install section and then delete the install 
section. 

  was:
I was reading the documentation when I noticed that these two sections 
* http://www.groovy-lang.org/download.html
* http://www.groovy-lang.org/install.html

are nearly duplicate. It seems that due to duplicatacy both of them have some 
information which the other ne does not have. It would be better to merge them 
into single section.


 Nearly Duplicate sections in documentation
 --

 Key: GROOVY-7544
 URL: https://issues.apache.org/jira/browse/GROOVY-7544
 Project: Groovy
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.4.4
Reporter: Aseem Bansal

 I was reading the documentation when I noticed that these two sections 
 * http://www.groovy-lang.org/download.html
 * http://www.groovy-lang.org/install.html
 are nearly duplicate. It seems that due to duplicatacy both of them have some 
 information which the other one does not have. It would be better to merge 
 them into single section.
 I would suggest to just keep the download section as it is better looking. 
 Merge the extra information from install section and then delete the install 
 section. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GROOVY-7543) Suggestion for Download page

2015-08-07 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/GROOVY-7543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated GROOVY-7543:
-
Description: 
On the groovy website download page http://www.groovy-lang.org/download.html 
there is a System requirements at the bottom.

It says JVM Required. Is it minimum/maximum/only version of JVM supported? If 
it is based on some automated build to test compatibility it would be good to 
link that.

  was:
On the groovy website download page http://www.groovy-lang.org/download.html 
there is a System requirements at the bottom.

It says JVM Required. Is it minimum/maximum/only ? If it is based on some 
automated build to test compatibility it would be good to link that.


 Suggestion for Download page
 

 Key: GROOVY-7543
 URL: https://issues.apache.org/jira/browse/GROOVY-7543
 Project: Groovy
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.4.4
 Environment: Website
Reporter: Aseem Bansal
Priority: Trivial

 On the groovy website download page http://www.groovy-lang.org/download.html 
 there is a System requirements at the bottom.
 It says JVM Required. Is it minimum/maximum/only version of JVM supported? 
 If it is based on some automated build to test compatibility it would be good 
 to link that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SPARK-9678) HTTP request to BlockManager port yields exception

2015-08-07 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14662321#comment-14662321
 ] 

Aseem Bansal commented on SPARK-9678:
-

I understand. Just thought to mention that.

 HTTP request to BlockManager port yields exception
 --

 Key: SPARK-9678
 URL: https://issues.apache.org/jira/browse/SPARK-9678
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.4.1
 Environment: Ubuntu 14.0.4
Reporter: Aseem Bansal
Priority: Minor

 I was going through the quick start for spark 1.4.1 at 
 http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. 
 Also the exact version that I am using is spark-1.4.1-bin-hadoop2.4
 The quick start has textFile = sc.textFile(README.md). I ran that and then 
 the following text appeared in the command line
 {noformat}
 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
 curMem=0, maxMem=278302556
 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
 memory (estimated size 140.5 KB, free 265.3 MB)
 15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
 curMem=143840, maxMem=278302556
 15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
 in memory (estimated size 12.3 KB, free 265.3 MB)
 15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
 on localhost:53311 (size: 12.3 KB, free: 265.4 MB)
 15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
 NativeMethodAccessorImpl.java:-2
 {noformat}
 I saw that there was an IP in these logs i.e. localhost:53311
 I tried connecting to it via Google Chrome and got an exception.
 {noformat}
  15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
  from /127.0.0.1:54056
 io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
 2147483647: 5135603447292250196 - discarded
   at 
 io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
   at 
 io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
   at 
 io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
   at 
 io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
   at 
 io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
   at 
 io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
   at 
 io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
   at 
 io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
   at 
 io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
   at 
 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
   at 
 io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
   at java.lang.Thread.run(Thread.java:745)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs

2015-08-06 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659917#comment-14659917
 ] 

Aseem Bansal commented on KAFKA-2364:
-

How do you reply to and get those emails? 

 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
Priority: Minor
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs

2015-08-05 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659500#comment-14659500
 ] 

Aseem Bansal commented on KAFKA-2364:
-

I did this but didn't get a reply. Are the replies shown somewhere? 

 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
Priority: Minor
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SPARK-9678) Exception while going through quick start

2015-08-05 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-9678:

Description: 
I was going through the quick start for spark 1.4.1 at 
http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also 
the exact version that I am using is spark-1.4.1-bin-hadoop2.4

The quick start has textFile = sc.textFile(README.md). I ran that and then 
the following text appeared in the command line


15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
curMem=0, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 140.5 KB, free 265.3 MB)
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
curMem=143840, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 12.3 KB, free 265.3 MB)
15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:53311 (size: 12.3 KB, free: 265.4 MB)
15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2


I saw that there was an IP in these logs i.e. localhost:53311

I tried connecting to it via Google Chrome and got an exception.

 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
 from /127.0.0.1:54056
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
2147483647: 5135603447292250196 - discarded
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)


  was:
I was going through the quick start for spark 1.4.1 at 
http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark

The quick start has textFile = sc.textFile(README.md). I ran that and then 
the following text appeared in the command line


15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
curMem=0, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 140.5 KB, free 265.3 MB)
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
curMem=143840, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 12.3 KB, free 265.3 MB)
15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:53311 (size: 12.3 KB, free: 265.4 MB)
15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2


I saw that there was an IP in these logs i.e. localhost:53311

I tried connecting to it via Google Chrome and got an exception.

 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
 from /127.0.0.1:54056
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
2147483647: 5135603447292250196 - discarded
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
at 

[jira] [Created] (SPARK-9678) Exception while going through quick start

2015-08-05 Thread Aseem Bansal (JIRA)
Aseem Bansal created SPARK-9678:
---

 Summary: Exception while going through quick start
 Key: SPARK-9678
 URL: https://issues.apache.org/jira/browse/SPARK-9678
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.1
 Environment: Ubuntu 14.0.4
Reporter: Aseem Bansal


I was going through the quick start for spark 1.4.1 at 
http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark

The quick start has textFile = sc.textFile(README.md). I ran that and then 
the following text appeared in the command line


15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
curMem=0, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 140.5 KB, free 265.3 MB)
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
curMem=143840, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 12.3 KB, free 265.3 MB)
15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:53311 (size: 12.3 KB, free: 265.4 MB)
15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2


I saw that there was an IP in these logs i.e. localhost:53311

I tried connecting to it via Google Chrome and got an exception.

 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
 from /127.0.0.1:54056
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
2147483647: 5135603447292250196 - discarded
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9678) Exception while going through quick start

2015-08-05 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated SPARK-9678:

Description: 
I was going through the quick start for spark 1.4.1 at 
http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also 
the exact version that I am using is spark-1.4.1-bin-hadoop2.4

The quick start has textFile = sc.textFile(README.md). I ran that and then 
the following text appeared in the command line

{noformat}
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
curMem=0, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 140.5 KB, free 265.3 MB)
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
curMem=143840, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 12.3 KB, free 265.3 MB)
15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:53311 (size: 12.3 KB, free: 265.4 MB)
15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2
{noformat}

I saw that there was an IP in these logs i.e. localhost:53311

I tried connecting to it via Google Chrome and got an exception.

{noformat}
 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
 from /127.0.0.1:54056
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
2147483647: 5135603447292250196 - discarded
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:343)
at 
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:249)
at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:149)
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
{noformat}

  was:
I was going through the quick start for spark 1.4.1 at 
http://spark.apache.org/docs/latest/quick-start.html. I am using pySpark. Also 
the exact version that I am using is spark-1.4.1-bin-hadoop2.4

The quick start has textFile = sc.textFile(README.md). I ran that and then 
the following text appeared in the command line


15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(143840) called with 
curMem=0, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 140.5 KB, free 265.3 MB)
15/08/06 10:37:03 INFO MemoryStore: ensureFreeSpace(12633) called with 
curMem=143840, maxMem=278302556
15/08/06 10:37:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 12.3 KB, free 265.3 MB)
15/08/06 10:37:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:53311 (size: 12.3 KB, free: 265.4 MB)
15/08/06 10:37:03 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2


I saw that there was an IP in these logs i.e. localhost:53311

I tried connecting to it via Google Chrome and got an exception.

 15/08/06 10:37:30 WARN TransportChannelHandler: Exception in connection 
 from /127.0.0.1:54056
io.netty.handler.codec.TooLongFrameException: Adjusted frame length exceeds 
2147483647: 5135603447292250196 - discarded
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.fail(LengthFieldBasedFrameDecoder.java:501)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.failIfNecessary(LengthFieldBasedFrameDecoder.java:477)
at 
io.netty.handler.codec.LengthFieldBasedFrameDecoder.decode(LengthFieldBasedFrameDecoder.java:403)
at 

[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs

2015-07-31 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649430#comment-14649430
 ] 

Aseem Bansal commented on KAFKA-2364:
-

It says Create a patch that applies cleanly against SVN trunk.. I understand 
what that means but isn't this process a bit too complex? While submitting 
patches to groovy/grails it was very easy. If this is due to not having a git 
mirror then let me know how I can help. 

I read https://blogs.apache.org/infra/entry/git_based_websites_available but I 
am not sure how I can help there. As per this it needs a ticket with apache 
infra team. Do you mean the migration from SVN to git? If yes, let me know.

 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
Priority: Minor
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2364) Improve documentation for contributing to docs

2015-07-31 Thread Aseem Bansal (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649447#comment-14649447
 ] 

Aseem Bansal commented on KAFKA-2364:
-

You mean dev@kafka.apache.org? I found that on 
http://kafka.apache.org/contributing.html. Or should I start a discussion on 
https://groups.google.com/forum/#!forum/kafka-dev? 

I know you said email but I find forums easier.

 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
Priority: Minor
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2364) Improve documentation for contributing to docs

2015-07-26 Thread Aseem Bansal (JIRA)
Aseem Bansal created KAFKA-2364:
---

 Summary: Improve documentation for contributing to docs
 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal


While reading the documentation for kafka 8 I saw some improvements that can be 
made. But the docs for contributing are not very good at 
https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
sure what to do. Can the README.MD file be improved for contributing to docs?

I have submitted patches to groovy and grails by sending PRs via github but  
looking at the comments it seems PRs via github are not working for kafka. It 
would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2364) Improve documentation for contributing to docs

2015-07-26 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated KAFKA-2364:

Description: 
While reading the documentation for kafka 8 I saw some improvements that can be 
made. But the docs for contributing are not very good at 
https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
sure what to do. Can the README.MD file be improved for contributing to docs?

I have submitted patches to groovy and grails by sending PRs via github but  
looking at the comments on PRs submitted to kafak it seems PRs via github are 
not working for kafka. It would be good to make that work also.

  was:
While reading the documentation for kafka 8 I saw some improvements that can be 
made. But the docs for contributing are not very good at 
https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
sure what to do. Can the README.MD file be improved for contributing to docs?

I have submitted patches to groovy and grails by sending PRs via github but  
looking at the comments it seems PRs via github are not working for kafka. It 
would be good to make that work also.


 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2364) Improve documentation for contributing to docs

2015-07-26 Thread Aseem Bansal (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aseem Bansal updated KAFKA-2364:

Priority: Minor  (was: Major)

 Improve documentation for contributing to docs
 --

 Key: KAFKA-2364
 URL: https://issues.apache.org/jira/browse/KAFKA-2364
 Project: Kafka
  Issue Type: Task
Reporter: Aseem Bansal
Priority: Minor
  Labels: doc

 While reading the documentation for kafka 8 I saw some improvements that can 
 be made. But the docs for contributing are not very good at 
 https://github.com/apache/kafka. It just gives me a URL for svn. But I am not 
 sure what to do. Can the README.MD file be improved for contributing to docs?
 I have submitted patches to groovy and grails by sending PRs via github but  
 looking at the comments on PRs submitted to kafak it seems PRs via github are 
 not working for kafka. It would be good to make that work also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)