from:"Dev Lakhani $JIRA$"

[jira] [Closed] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani closed SPARK-10798.
---
Resolution: Cannot Reproduce

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15055130#comment-15055130
 ] 

Dev Lakhani commented on SPARK-10798:
-

byte[] data= Kryo.serialize(List)

This is just shorthand for new Kryo().serialize(). I think this issue was a 
classpath issue, I was not able to reproduce it, but if it reappears I will 
re-open it.

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-10-04 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942651#comment-14942651
 ] 

Dev Lakhani commented on SPARK-10798:
-

Hi Miao

I will create a github project/fork for this to give you the full sample soon.

Thanks
Dev

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-26 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Environment: Linux, Java 1.8.45  (was: Linux, Java 1.8.40)

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-26 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Description: 
When trying to create an RDD of Rows using a Java Spark Context and if I 
serialize the rows with Kryo first, the sparkContext fails.

byte[] data= Kryo.serialize(List)
List fromKryoRows=Kryo.unserialize(data)

List rows= new Vector(); //using a new set of data.
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

OR

javaSparkContext.parallelize(fromKryoRows); //using deserialized rows

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_45. I searched the web and this Jira for similar 
issues but found nothing of interest.
 

  was:
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at

[jira] [Updated] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-24 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10798:

Description: 
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_40. I searched the web and this Jira for similar 
issues but found nothing of interest.
 

  was:
When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at

[jira] [Created] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-09-24 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-10798:
---

 Summary: JsonMappingException with Spark Context Parallelize
 Key: SPARK-10798
 URL: https://issues.apache.org/jira/browse/SPARK-10798
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
 Environment: Linux, Java 1.8.40
Reporter: Dev Lakhani


When trying to create an RDD of Rows using a Java Spark Context:

List rows= new Vector();
rows.add(RowFactory.create("test"));
javaSparkContext.parallelize(rows);

I get :

com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
scala.Tuple2) (through reference chain: 
org.apache.spark.rdd.RDDOperationScope["parent"])
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
   at 
com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
   at 
com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
   at 
com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
   at 
com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
   at 
com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
   at 
com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
   at 
org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
   at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
   at 
org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
   at 
org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
   at 
org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
   ...
Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
   at scala.Option.getOrElse(Option.scala:120)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
   at 
com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
   at 
com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
   at 
com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
   at 
com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
   ... 19 more

I've tried updating jackson module scala to 2.6.1 but same issue. This happens 
in local mode with java 1.8_40
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10700) Spark R Documentation not available

2015-09-18 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-10700:

Description: 
Documentation

https://spark.apache.org/docs/latest/api/R/glm.html refered to in

 https://spark.apache.org/docs/latest/sparkr.html  is not available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.

  was:
Documentation https://spark.apache.org/docs/latest/sparkr.html  is not 
available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.


> Spark R Documentation not available
> ---
>
> Key: SPARK-10700
> URL: https://issues.apache.org/jira/browse/SPARK-10700
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Dev Lakhani
>Priority: Minor
>
> Documentation
> https://spark.apache.org/docs/latest/api/R/glm.html refered to in
>  https://spark.apache.org/docs/latest/sparkr.html  is not available.
> I searched this JIRA site for sparkr.html SparkR Documentation and do not 
> think any one else has raised this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10700) Spark R Documentation not available

2015-09-18 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-10700:
---

 Summary: Spark R Documentation not available
 Key: SPARK-10700
 URL: https://issues.apache.org/jira/browse/SPARK-10700
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.5.0
Reporter: Dev Lakhani
Priority: Minor


Documentation https://spark.apache.org/docs/latest/sparkr.html  is not 
available.

I searched this JIRA site for sparkr.html SparkR Documentation and do not think 
any one else has raised this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (IGNITE-974) Discovery issue

2015-08-18 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701055#comment-14701055
 ] 

Dev Lakhani commented on IGNITE-974:


I've experiencing this too, I think part of the problem is that the message is 
a bit inaccurate. For example:

{code:title=TcpDiscoverySpi.java#L1403|borderStyle=solid}
 catch (IOException | IgniteCheckedException e) {
if (X.hasCause(e, SocketTimeoutException.class))
LT.warn(log, null, Timed out waiting for message to be read 
(most probably, the reason is  +
in long GC pauses on remote node) [curTimeout= + timeout 
+ ']');

throw e;
}
{code}

In my case the exception e is class: org.apache.ignite.IgniteCheckedException: 
Failed to deserialize object with given class loader: java.net.URLClassloader. 
It has nothing to do with GC pauses so maybe the underlying exception needs to 
be logged better.

 Discovery issue
 ---

 Key: IGNITE-974
 URL: https://issues.apache.org/jira/browse/IGNITE-974
 Project: Ignite
  Issue Type: Bug
Reporter: Pavel Konstantinov
Assignee: Semen Boikov
Priority: Blocker
 Fix For: ignite-1.4


 I'm started tester with load and got in node console many messages like this:
 {code}
 16:04:42,396][WARNING][tcp-disco-msg-worker-#5%tester][TcpDiscoverySpi] Timed 
 out waiting for message to be read (most probably, the reason is in long GC 
 pauses on remote node. Current timeout: 400.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-26 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602669#comment-14602669
 ] 

Dev Lakhani commented on SPARK-1867:


[~marcreichman] , [~meiyoula], [~srowen], [~sam] as a minimum isn't it worth us 
up voting the JDK bug : https://bugs.openjdk.java.net/browse/JDK-7172206  as 
this seems to be part of the problem?

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail:

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589712#comment-14589712
 ] 

Dev Lakhani commented on HBASE-13825:
-

Hi [~mantonov], we don't have hbase.table.max.rowsize set but we also don't see 
any RowTooBigExceptions being thrown either in the region server logs (which I 
cannot send out unfortunately)

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589713#comment-14589713
 ] 

Dev Lakhani commented on HBASE-13825:
-

Hi [~mantonov], we don't have hbase.table.max.rowsize set but we also don't see 
any RowTooBigExceptions being thrown either in the region server logs (which I 
cannot send out unfortunately)

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589711#comment-14589711
 ] 

Dev Lakhani commented on HBASE-13825:
-

Hi [~mantonov], we don't have hbase.table.max.rowsize set but we also don't see 
any RowTooBigExceptions being thrown either in the region server logs (which I 
cannot send out unfortunately)

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589715#comment-14589715
 ] 

Dev Lakhani commented on HBASE-13825:
-

Sorry for the multiple postings, slow internet connection so I retried adding 
the comment a few too many times.

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SPARK-8395) spark-submit documentation is incorrect

2015-06-16 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8395:
--

 Summary: spark-submit documentation is incorrect
 Key: SPARK-8395
 URL: https://issues.apache.org/jira/browse/SPARK-8395
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.4.0
Reporter: Dev Lakhani
Priority: Minor


Using a fresh checkout of 1.4.0-bin-hadoop2.6

if you run 
./start-slave.sh  1 spark://localhost:7077

you get
failed to launch org.apache.spark.deploy.worker.Worker:
 Default is conf/spark-defaults.conf.
  15/06/16 13:11:08 INFO Utils: Shutdown hook called

it seems the worker number is not being accepted  as desccribed here:
https://spark.apache.org/docs/latest/spark-standalone.html

The documentation says:
./sbin/start-slave.sh worker# master-spark-URL

but the start.slave-sh script states:
usage=Usage: start-slave.sh spark-master-URL where spark-master-URL is 
like spark://localhost:7077

I have checked for similar issues using :
https://issues.apache.org/jira/browse/SPARK-6552?jql=text%20~%20%22start-slave%22

and found nothing similar so am raising this as an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-12 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani resolved SPARK-8143.

   Resolution: Fixed
Fix Version/s: 1.4.0

Verified, history for killed jobs are now available in the webui

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani
 Fix For: 1.4.0


 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580689#comment-14580689
 ] 

Dev Lakhani commented on SPARK-8142:


To clarify [~srowen] 

1)  I meant the other way around, if we choose to use Apache Spark, which 
provides Apache Hadoop libs and we then choose a Cloudera Hadoop distribution 
on our (the rest of our) cluster and use Cloudera Hadoop clients in the 
application code. Spark will provide Apache Hadoop libs whereas our cluster 
will be cdh5. Is there any issue in doing this?

We choose to use Apache Spark because the CDH is a version behind the official 
Spark release and we don't want to wait for say Dataframes support.

2) If I mark my spark core as provided right now, as we speak , my code 
compiles but when I run my application in my IDE using Spark local I get: 
NoClassFoundError org/apache/spark/api/java/function/Function this is why I am 
suggesting whether we need maven profiles, one for local testing and one for 
deployment? 

So getting back to the issue raised in this JIRA, which we seem to be ignoring, 
even when Hadoop and Spark is provided and Hbase client/protocol/server is 
packaged we run into SPARK-1867 which at latest comment suggests a dependency 
is missing and this results in the obscure exception. Whether this is on the 
Hadoop side or Spark side is not known but as the JIRA suggests it was caused 
by a missing dependency. I cannot see this missing class/dependency exception 
anywhere in the spark logs. This suggests that if anyone using Spark sets any 
of the userClasspath* misses out a primary, secondary or tertiary dependency 
they will encounter SPARK-1867.

Therefore we are stuck, any suggestions are welcome to overcome this. Either 
there is a need make ChildFirstURLClassLoader ignore Spark and Hadoop libs or 
help spark log what's causing SPARK-1867.



 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580730#comment-14580730
]

Dev Lakhani commented on SPARK-8142:

Hi [~vanzin]

bq. if you want to use the glassfish jersey version, you shouldn't need to do
this, right? Spark depends on the old one that is under com.sun.*, IIRC.

Yes I need to make use of glassfish 2.x in my application and not the sun.* one
provided, but this could apply to any other dependency that needs to supersede
Sparks provided etc.

bq. marking all dependencies (including hbase) as provided and using {{spark.
{driver,executor}.extraClassPath}} might be the easiest way out if you really
need to use userClassPathFirst.

This is an option but might be a challenge to scale if we have different
folder layouts for the extraClassPath in different clusters/nodes for hbase and
hadoop installs. This can be (and usually is) the case when new servers are
added to existing ones for example. If we had /disk4/path/to/hbase/libs and
the other has /disk3/another/path/to/hbase/libs and so on then the
extraClassPath will need to include these both and grow significantly and spark
submit args along with it. Also when we update Hbase these then have to change
this classpath each time.

Maybe the ideal way is to have, as you suggest, a blacklist which would contain
spark and hadoop libs. Then we could put whatever we wanted into one uber/fat
jar and it doesn't matter where Hbase and Hadoop are installed or what's
provided and compiled, but we let spark work it out.

These are just my thoughts, I'm sure others will have different preferences
and/or better approaches. Thanks anyway for your input on this JIRA.

Spark Job Fails with ResultTask ClassCastException
--

Key: SPARK-8142
URL: https://issues.apache.org/jira/browse/SPARK-8142
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

When running a Spark Job, I get no failures in the application code
whatsoever but a weird ResultTask Class exception. In my job, I create a RDD
from HBase and for each partition do a REST call on an API, using a REST
client. This has worked in IntelliJ but when I deploy to a cluster using
spark-submit.sh I get :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0
(TID 3, host): java.lang.ClassCastException:
org.apache.spark.scheduler.ResultTask cannot be cast to
org.apache.spark.scheduler.Task
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
These are the configs I set to override the spark classpath because I want to
use my own glassfish jersey version:

sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);
I see no other warnings or errors in any of the logs.
Unfortunately I cannot post my code, but please ask me questions that will
help debug the issue. Using spark 1.3.1 hadoop 2.6.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14581130#comment-14581130
 ] 

Dev Lakhani commented on SPARK-1867:


Worth a new JIRA to suggest this?

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-10 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14580632#comment-14580632
 ] 

Dev Lakhani commented on SPARK-8142:


[~vanzin] [~srowen] since this has been verified independently there appears to 
be a limitation in the ChildFirstURLClassLoader class which may be causing this 
issue. The approach to mark Spark/Hadoop deps as provided may not be ideal 
because 1) it requires a maven profile for compilation/testing and deployment 
2) If we run into SPARK-1867 there is no easy way to spot missing dependencies. 
3) If we are using cdh* versions of Hadoop (client/server) then spark's 
provided Hadoop versions will differ from the CDH client being used. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2015-06-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578745#comment-14578745
 ] 

Dev Lakhani commented on SPARK-1867:


Although this has been marked with not an issue, I agree with [~marcreichman] 
it is a very misleading error and there's often no way to figure out which 
classes are missing. There should be an explicit ClassNotFoundException or some 
other check or warning. Whenever dependencies are missing it needs to be 
actionable.  

 Spark Documentation Error causes java.lang.IllegalStateException: unread 
 block data
 ---

 Key: SPARK-1867
 URL: https://issues.apache.org/jira/browse/SPARK-1867
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: sam

 I've employed two System Administrators on a contract basis (for quite a bit 
 of money), and both contractors have independently hit the following 
 exception.  What we are doing is:
 1. Installing Spark 0.9.1 according to the documentation on the website, 
 along with CDH4 (and another cluster with CDH5) distros of hadoop/hdfs.
 2. Building a fat jar with a Spark app with sbt then trying to run it on the 
 cluster
 I've also included code snippets, and sbt deps at the bottom.
 When I've Googled this, there seems to be two somewhat vague responses:
 a) Mismatching spark versions on nodes/user code
 b) Need to add more jars to the SparkConf
 Now I know that (b) is not the problem having successfully run the same code 
 on other clusters while only including one jar (it's a fat jar).
 But I have no idea how to check for (a) - it appears Spark doesn't have any 
 version checks or anything - it would be nice if it checked versions and 
 threw a mismatching version exception: you have user code using version X 
 and node Y has version Z.
 I would be very grateful for advice on this.
 The exception:
 Exception in thread main org.apache.spark.SparkException: Job aborted: Task 
 0.0:1 failed 32 times (most recent failure: Exception failure: 
 java.lang.IllegalStateException: unread block data)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 14/05/16 18:05:31 INFO scheduler.TaskSetManager: Loss was due to 
 java.lang.IllegalStateException: unread block data [duplicate 59]
 My code snippet:
 val conf = new SparkConf()
.setMaster(clusterMaster)
.setAppName(appName)
.setSparkHome(sparkHome)
.setJars(SparkContext.jarOfClass(this.getClass))
 println(count =  + new SparkContext(conf).textFile(someHdfsPath).count())
 My SBT dependencies:
 // relevant
 org.apache.spark % spark-core_2.10 % 0.9.1,
 org.apache.hadoop % hadoop-client % 2.3.0-mr1-cdh5.0.0,
 // standard, probably unrelated
 com.github.seratch %% awscala % [0.2,),
 org.scalacheck %% scalacheck % 1.10.1 % test,
 org.specs2 %% specs2 % 1.14 % test,
 org.scala-lang % scala-reflect % 2.10.3,
 org.scalaz %% scalaz-core % 7.0.5,
 net.minidev % json-smart % 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578916#comment-14578916
 ] 

Dev Lakhani commented on SPARK-8142:


Any suggestion on this? To summarise, Spark and Hadoop marked as provided. Now 
running into https://issues.apache.org/jira/browse/SPARK-1867, some  missing 
dependency is causing this and there is no indication what it is. This is 
becoming a blocker for our organisation. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14576816#comment-14576816
 ] 

Dev Lakhani commented on SPARK-8142:


Also I can confirm, without the user userClassPathFirst, the job runs but fails 
at a different point where my Jersey version clashes with my application code. 
So it seems to be an issue with the userClassPathFirst setting, when it is set 
I get the ClassCastException.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577230#comment-14577230
 ] 

Dev Lakhani commented on SPARK-8142:


I thought it was only spark deps but I've now removed all HBase and Hadoop deps 
form my uber jar. Now when I run the job I cannot locate all the relevant hbase 
client classes + deps without specifying each hbase client/server/protocol etc 
jar using --driver-class-path. Is there some spark env variable I can set to 
point to all jars under a folder otherwise I'll have to add all 20+ Hbase libs  
using the driver class path option? I know about SPARK_CLASSPATH but need a 
more elegant solution than to reference all hbase and hadoop jars myself.

HADOOP_HOME, HADOOP_CLASSPATH and HADOOP_CONF_DIR are already set.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577262#comment-14577262
 ] 

Dev Lakhani commented on SPARK-8142:


HBase kept, Hadoop and Spark removed. Now I get: ClasscastException 
org.apache.hadoop.hbase.mapreduce.TableSplit cannot be case to  
org.apache.hadoop.hbase.mapreduce.InputSplit at NewHadoopRDD.scala:115. This 
used to work when I had all spark and hadoop dependencies added in the uber jar.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577265#comment-14577265
 ] 

Dev Lakhani commented on SPARK-8142:


To be specific when I say removed I mean marked as provided

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577108#comment-14577108
 ] 

Dev Lakhani commented on SPARK-8142:


I am not bundling spark, again, I have pre-built binaries downloaded from the 
spark website, deployed on my cluster: version 1.3.1 hadoop 2.6. I am using 
java maven client dependencies for org.apache.spark spark core_2.10  version 
1.3.1 in my application- added as a maven dependency. I package my application 
using mvn shade, this builds me a jar with my applications code,  glassfish 
jersey 2.7 deps and spark 1.3.1 core deps.

I then spark-submit my job jar using spark submit (from spark 1.3.1) with my 
jar and the classcast error occurs if I have the userClasspath set to true as 
described above.

 If I don't my REST client tries to do a get operation and uses the glassfish 
jersey 2.7 api but this conflicts with com.sun.jersey 1.9 which comes with 
Spark.

I am using 1 version of spark 1.3.1-hadoop.2.6 on my cluster, I have no other 
versions of spark on that cluster. The RELEASE file states 1.3.1 (git revision 
908a0bf) build for Hadoop 2.6.0.  Confirmed on all nodes.

I did a maven dependency tree on my application code and the only spark version 
is 1.3.1 in all of the maven dependencies that I use. Spark SQL 1.3.1, Spark 
Core 1.3.1, Spark Network Common 1.3.1. I've been using this version fine for 
all other non REST based operations and other spark operations, its only when I 
use userClasspath I get this error.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577130#comment-14577130
 ] 

Dev Lakhani commented on SPARK-8143:


I cannot try it against master because we are restricted to official releases 
only, we are not allowed external git access due to organisational constraints. 
If you also have an expectation for users to continually build from master to 
verify the existence of bugs and JIRAs perhaps you need to mandate that process 
in: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-JIRA
  to be explicit as different projects have different rules. In this case, I 
accept that I did not research this issue before posting it, if that it the 
case close this as a duplicate, since I cannot verify against master.

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani

 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577155#comment-14577155
 ] 

Dev Lakhani commented on SPARK-8142:


spark-core and spark-sql marked as provided, same error.

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-08 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577371#comment-14577371
 ] 

Dev Lakhani commented on SPARK-8142:


Update:  having resolved some dependencies issues the current state is this:

hadoop-common 2.6.0 - provided
hadoop-client 2.6.0 provided
hadoop -hdfs 2.6.0 provided
spark-sql_s.10 provided
spark-core_2.10 provided
hbase-client 1.1.0 included.packaged
hbase -protocol 1.1.0 included/packaged
hbase -server 1.1.0 included/packaged

I run the job and run into this: 
https://issues.apache.org/jira/browse/SPARK-1867 which suggests a class is 
missing, how do I find which one? There is ClassNotFoundException exception but 
something might be missing, how can I find this out?

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Summary: Spark Job Fails with ResultTask Class Cast Exception  (was: Spark 
Job Fails with ResultTask Class Exception)

 Spark Job Fails with ResultTask Class Cast Exception
 

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8142) Spark Job Fails with ResultTask Class Exception

2015-06-06 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8142:
--

 Summary: Spark Job Fails with ResultTask Class Exception
 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani


When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3 hadoop 2.6.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-06 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-8143:
--

 Summary: Spark application history cannot be found even for 
finished jobs
 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1, 1.3.0
Reporter: Dev Lakhani


Whenever a job is killed or finished, because of an application error or 
otherwise and when I then click on Application Detail UI, even through the job 
state is : FINISHED, I get no log results and the message states:

Application history not found for (app-xyz-abc) 

Application ABC is still in progress. 

An no logs are presented.

I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/sparkunder 
which I see lots of files

app-2015xyz-abc.inprogress

Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575958#comment-14575958
 ] 

Dev Lakhani commented on SPARK-8142:


I'm using pre-compiled binaries for spark 1.3.1 hadoop 2.6. and checked the 
spark versions with my application. As I mention, I need to use my classpath 
not sparks hence the setting. If I don't spark makes use of 
jersey.version1.9/jersey.version from 
https://github.com/apache/spark/blob/master/yarn/pom.xml is not compatible with 
my application code. I know the userClasspath settings are experimental but 
they are required for my use case. 

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8143) Spark application history cannot be found even for finished jobs

2015-06-06 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575959#comment-14575959
 ] 

Dev Lakhani commented on SPARK-8143:


Ok will wait for the official 1.4 release and will confirm.

 Spark application history cannot be found even for finished jobs
 

 Key: SPARK-8143
 URL: https://issues.apache.org/jira/browse/SPARK-8143
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.0, 1.3.1
Reporter: Dev Lakhani

 Whenever a job is killed or finished, because of an application error or 
 otherwise and when I then click on Application Detail UI, even through the 
 job state is : FINISHED, I get no log results and the message states:
 Application history not found for (app-xyz-abc) 
 Application ABC is still in progress. 
 An no logs are presented.
 I'm using spark.eventLog.enabled, true and spark.eventLog.dir=/tmp/spark 
 under which I see lots of files
 app-2015xyz-abc.inprogress
 Even through the job has failed or finished.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-06 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575960#comment-14575960
 ] 

Dev Lakhani commented on HBASE-13825:
-

This probably needs to be configurable as a Hbase option as we cannot change 
client code.

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask Class Cast Exception

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Description: 
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.



  was:
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3 hadoop 2.6.




 Spark Job Fails with ResultTask Class Cast Exception
 

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Summary: Spark Job Fails with ResultTask ClassCastException  (was: Spark 
Job Fails with ResultTask Class Cast Exception)

 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job I run create an 
 RDD from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8142) Spark Job Fails with ResultTask ClassCastException

2015-06-06 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-8142:
---
Description: 
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job, I create a RDD from HBase 
and for each partition do a REST call on an API, using a REST client.  This has 
worked in IntelliJ but when I deploy to a cluster using spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.



  was:
When running a Spark Job, I get no failures in the application code whatsoever 
but a weird ResultTask Class exception. In my job I run create an RDD from 
HBase and for each partition do a REST call on an API, using a REST client.  
This has worked in IntelliJ but when I deploy to a cluster using 
spark-submit.sh I get :

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, host): java.lang.ClassCastException: org.apache.spark.scheduler.ResultTask 
cannot be cast to org.apache.spark.scheduler.Task
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

These are the configs I set to override the spark classpath because I want to 
use my own glassfish jersey version:
 
sparkConf.set(spark.driver.userClassPathFirst,true);
sparkConf.set(spark.executor.userClassPathFirst,true);

I see no other warnings or errors in any of the logs.

Unfortunately I cannot post my code, but please ask me questions that will help 
debug the issue. Using spark 1.3.1 hadoop 2.6.




 Spark Job Fails with ResultTask ClassCastException
 --

 Key: SPARK-8142
 URL: https://issues.apache.org/jira/browse/SPARK-8142
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Dev Lakhani

 When running a Spark Job, I get no failures in the application code 
 whatsoever but a weird ResultTask Class exception. In my job, I create a RDD 
 from HBase and for each partition do a REST call on an API, using a REST 
 client.  This has worked in IntelliJ but when I deploy to a cluster using 
 spark-submit.sh I get :
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
 stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
 (TID 3, host): java.lang.ClassCastException: 
 org.apache.spark.scheduler.ResultTask cannot be cast to 
 org.apache.spark.scheduler.Task
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at java.lang.Thread.run(Thread.java:745)
 These are the configs I set to override the spark classpath because I want to 
 use my own glassfish jersey version:
  
 sparkConf.set(spark.driver.userClassPathFirst,true);
 sparkConf.set(spark.executor.userClassPathFirst,true);
 I see no other warnings or errors in any of the logs.
 Unfortunately I cannot post my code, but please ask me questions that will 
 help debug the issue. Using spark 1.3.1 hadoop 2.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-03 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570472#comment-14570472
 ] 

Dev Lakhani commented on HBASE-13825:
-

Thanks for the suggestion [~apurtell] , this is what the stack trace suggests 
but please can you help with a code snippet? When you say change it in the 
client do you mean the Hbase client or the application client calling the 
get. I am only able/permitted to use pre-built Hbase jars from maven so 
cannot change Hbase code in any way.

CodedInputStream.setSizeLimit() suggests using a static method which does not 
exist. Furthermore I have no instances of CodedInputStream in by application 
client so where should I set this size limit?

Is it work adding a HBase parameter for this?  

 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
 at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
 at 
 org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)
 This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
 that issue is related to cluster status. 
 Scan and put operations on the same data work fine
 Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-02 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated HBASE-13825:

Description: 
When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issue is related to cluster status. 

Scan and put operations on the same data work fine

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.




  was:
When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issue is related to cluster status. 

Can and put operations on the same data work fine

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.





 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at

[jira] [Created] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-02 Thread Dev Lakhani (JIRA)

Dev Lakhani created HBASE-13825:
---

 Summary: Get operations on large objects fail with protocol errors
 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
Reporter: Dev Lakhani


When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issues is related to cluster status. 

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-02 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated HBASE-13825:

Description: 
When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issue is related to cluster status. 

Can and put operations on the same data work fine

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.




  was:
When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issues is related to cluster status. 

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.





 Get operations on large objects fail with protocol errors
 -

 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0, 1.0.1
Reporter: Dev Lakhani

 When performing a get operation on a column family with more than 64MB of 
 data, the operation fails with:
 Caused by: Portable(java.io.IOException): Call to host:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
 at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
 at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
 at 
 org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
 at 
 org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
 at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
 at

[jira] [Created] (HBASE-13825) Get operations on large objects fail with protocol errors

2015-06-02 Thread Dev Lakhani (JIRA)

Dev Lakhani created HBASE-13825:
---

 Summary: Get operations on large objects fail with protocol errors
 Key: HBASE-13825
 URL: https://issues.apache.org/jira/browse/HBASE-13825
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.1, 1.0.0
Reporter: Dev Lakhani


When performing a get operation on a column family with more than 64MB of data, 
the operation fails with:

Caused by: Portable(java.io.IOException): Call to host:port failed on local 
exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to 
increase the size limit.
at 
org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1481)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1453)
at 
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1653)
at 
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1711)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:27308)
at 
org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1381)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:753)
at org.apache.hadoop.hbase.client.HTable$3.call(HTable.java:751)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:120)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:756)
at org.apache.hadoop.hbase.client.HTable.get(HTable.java:765)
at 
org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:395)

This may be related to https://issues.apache.org/jira/browse/HBASE-11747 but 
that issues is related to cluster status. 

Tested on a 1.0.0 cluster with both 1.0.1 and 1.0.0 clients.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13695) Cannot timeout Hbase bulk operations

2015-05-15 Thread Dev Lakhani (JIRA)

Dev Lakhani created HBASE-13695:
---

 Summary: Cannot timeout Hbase bulk operations
 Key: HBASE-13695
 URL: https://issues.apache.org/jira/browse/HBASE-13695
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani


Using the Hbase 1.0.0 client. In HTable there is a batch() operation which 
calls AyncRequest ars ...
ars.waitUntilDone()

This invokes waitUntilDone with Long.Max.

Does this mean batch operations cannot be interrupted or invoked with a 
timeout? We are seeing some batch operations taking so long that our client 
hangs forever in 

Waiting for.. actions to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13695) Cannot timeout Hbase bulk operations

2015-05-15 Thread Dev Lakhani (JIRA)

Dev Lakhani created HBASE-13695:
---

 Summary: Cannot timeout Hbase bulk operations
 Key: HBASE-13695
 URL: https://issues.apache.org/jira/browse/HBASE-13695
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani


Using the Hbase 1.0.0 client. In HTable there is a batch() operation which 
calls AyncRequest ars ...
ars.waitUntilDone()

This invokes waitUntilDone with Long.Max.

Does this mean batch operations cannot be interrupted or invoked with a 
timeout? We are seeing some batch operations taking so long that our client 
hangs forever in 

Waiting for.. actions to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13695) Cannot timeout or interrupt Hbase bulk/batch operations

2015-05-15 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated HBASE-13695:

Summary: Cannot timeout or interrupt Hbase bulk/batch operations  (was: 
Cannot timeout Hbase bulk operations)

 Cannot timeout or interrupt Hbase bulk/batch operations
 ---

 Key: HBASE-13695
 URL: https://issues.apache.org/jira/browse/HBASE-13695
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani

 Using the Hbase 1.0.0 client. In HTable there is a batch() operation which 
 calls AyncRequest ars ...
 ars.waitUntilDone()
 This invokes waitUntilDone with Long.Max.
 Does this mean batch operations cannot be interrupted or invoked with a 
 timeout? We are seeing some batch operations taking so long that our client 
 hangs forever in 
 Waiting for.. actions to finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.

2015-05-14 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543970#comment-14543970
 ] 

Dev Lakhani commented on SPARK-6846:


As a more complex solution would it be possible to have a unique stage id and 
use that in the URL?

 
http://localhost:4040/stages/kill/?id=0stage-id=UNIQUE-STAGE-IDterminate=true

A simple auto complete of a previous kill command in chrome followed by an 
enter can kill hours worth of work done. Or any other ideas?

 Stage kill URL easy to accidentally trigger and possibility for security 
 issue.
 ---

 Key: SPARK-6846
 URL: https://issues.apache.org/jira/browse/SPARK-6846
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Dev Lakhani
Assignee: Sean Owen
Priority: Minor
 Fix For: 1.4.0


 On a similar note: When the kill link is cached in the browser bar, it's easy 
 to accidentally kill a job just by pressing enter. For example:
 You press the kill stage button and get the prompt whether you want to kill 
 the stage. You launch a new job and start typing:
 http://localhost:4040/
 Chrome for example starts auto completing with 
 http://localhost:4040/stages/kill/?id=0terminate=true 
 If you accidentally press enter it will kill the current stage without any 
 prompts.
 I think its also a bit of a security issue if from any host you can 
 curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and 
 it will kill the current stage without prompting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (HBASE-13449) HBASE_REGIONSERVER_OPTS in local-regionservers.sh is masked and silently reset.

2015-05-14 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543965#comment-14543965
 ] 

Dev Lakhani commented on HBASE-13449:
-

Any update on this?  In production this method is not sustainable. Maybe we 
then need HBASE_LOCAL_REGIONSERVER_OPTS so that we can append JMX opts to the 
region server?

 HBASE_REGIONSERVER_OPTS in local-regionservers.sh is masked and silently 
 reset.
 ---

 Key: HBASE-13449
 URL: https://issues.apache.org/jira/browse/HBASE-13449
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani

 In local-regionserver.sh there is a line which masks the current 
 HBASE_REGIONSERVER_OPTS
 https://github.com/apache/hbase/blob/master/bin/local-regionservers.sh#L38
 # sanity check: make sure your regionserver opts don't use ports [i.e. 
 JMX/DBG]
 export HBASE_REGIONSERVER_OPTS= 
 We are having trouble with this because:
 1) As per normal Hadoop convention env variables should be in hbase-env.sh, 
 as with other daemons this is where we set jmx/jvm properties.
 2) Whatever we set in hbase-env.sh gets masked by this line.
 Is there any reason this line is included in the runner and can it be moved 
 to the env.sh file?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11747) ClusterStatus is too bulky

2015-04-27 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514224#comment-14514224
 ] 

Dev Lakhani commented on HBASE-11747:
-

Is there any progress on this, or a workaround we can make use of? The comments 
above by [~virag] state setting: CodedInputStream.setSizeLimit() where can we 
do this, is it possible to do this in the application code? Or is it possible 
to set any other config param for example, will 
replication.source.size.capacity help with a workaround until a fix is 
implemented? Thanks

 ClusterStatus is too bulky 
 ---

 Key: HBASE-11747
 URL: https://issues.apache.org/jira/browse/HBASE-11747
 Project: HBase
  Issue Type: Sub-task
Reporter: Virag Kothari
 Attachments: exceptiontrace


 Following exception on 0.98 with 1M regions on cluster with 160 region servers
 {code}
 Caused by: java.io.IOException: Call to regionserverhost:port failed on local 
 exception: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1482)
   at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1454)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
   at 
 org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
   at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.getClusterStatus(MasterProtos.java:42555)
   at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$5.getClusterStatus(HConnectionManager.java:2132)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2166)
   at 
 org.apache.hadoop.hbase.client.HBaseAdmin$16.call(HBaseAdmin.java:2162)
   at 
 org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
   ... 43 more
 Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol 
 message was too large.  May be malicious.  Use 
 CodedInputStream.setSizeLimit() to increase the size limit.
   at 
 com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13452) HRegion warning about memstore size miscalculation is not actionable

2015-04-18 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501268#comment-14501268
 ] 

Dev Lakhani commented on HBASE-13452:
-

I don't see that anywhere before or after the Memstore size error or indeed 
anywhere in our logs for master and regions. We have INFO level logging though.

Real examples of memstore sizes are:

Memstore size is 4390632
Memstore size is 558744
Memstore size is 4390632
Memstore size is 558744 

 HRegion warning about memstore size miscalculation is not actionable
 

 Key: HBASE-13452
 URL: https://issues.apache.org/jira/browse/HBASE-13452
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani
Priority: Blocker
 Fix For: 2.0.0, 1.1.0, 1.0.2


 During normal operation the HRegion class reports a message related to 
 memstore flushing in HRegion.class :
   if (!canFlush) {
 addAndGetGlobalMemstoreSize(-memstoreSize.get());
   } else if (memstoreSize.get() != 0) {
 LOG.error(Memstore size is  + memstoreSize.get());
   }
 The log file is filled with lots of 
 Memstore size is: 190192
 Memstore size is: 442232
 Memstore size is: 190192
 ...
 These message are uninformative, clog up the logs and offers no root cause 
 nor solution. Maybe the message needs to be more informative, changed to WARN 
 or some further information provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13452) HRegion warning about memstore size miscalculation is not actionable

2015-04-18 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501269#comment-14501269
 ] 

Dev Lakhani commented on HBASE-13452:
-

I don't see that anywhere before or after the Memstore size error or indeed 
anywhere in our logs for master and regions. We have INFO level logging though.

Real examples of memstore sizes are:

Memstore size is 4390632
Memstore size is 558744
Memstore size is 4390632
Memstore size is 558744 

 HRegion warning about memstore size miscalculation is not actionable
 

 Key: HBASE-13452
 URL: https://issues.apache.org/jira/browse/HBASE-13452
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani
Priority: Blocker
 Fix For: 2.0.0, 1.1.0, 1.0.2


 During normal operation the HRegion class reports a message related to 
 memstore flushing in HRegion.class :
   if (!canFlush) {
 addAndGetGlobalMemstoreSize(-memstoreSize.get());
   } else if (memstoreSize.get() != 0) {
 LOG.error(Memstore size is  + memstoreSize.get());
   }
 The log file is filled with lots of 
 Memstore size is: 190192
 Memstore size is: 442232
 Memstore size is: 190192
 ...
 These message are uninformative, clog up the logs and offers no root cause 
 nor solution. Maybe the message needs to be more informative, changed to WARN 
 or some further information provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HBASE-13452) HRegion warning about memstore size miscalculation is not actionable

2015-04-18 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated HBASE-13452:

Description: 
During normal operation the HRegion class reports a message related to memstore 
flushing in HRegion.class :

  if (!canFlush) {
addAndGetGlobalMemstoreSize(-memstoreSize.get());
  } else if (memstoreSize.get() != 0) {
LOG.error(Memstore size is  + memstoreSize.get());
  }

The log file is filled with lots of 

Memstore size is 558744
Memstore size is 4390632
Memstore size is 558744 
...

These message are uninformative, clog up the logs and offers no root cause nor 
solution. Maybe the message needs to be more informative, changed to WARN or 
some further information provided.

  was:
During normal operation the HRegion class reports a message related to memstore 
flushing in HRegion.class :

  if (!canFlush) {
addAndGetGlobalMemstoreSize(-memstoreSize.get());
  } else if (memstoreSize.get() != 0) {
LOG.error(Memstore size is  + memstoreSize.get());
  }

The log file is filled with lots of 

Memstore size is: 190192
Memstore size is: 442232
Memstore size is: 190192
...

These message are uninformative, clog up the logs and offers no root cause nor 
solution. Maybe the message needs to be more informative, changed to WARN or 
some further information provided.


 HRegion warning about memstore size miscalculation is not actionable
 

 Key: HBASE-13452
 URL: https://issues.apache.org/jira/browse/HBASE-13452
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani
Priority: Blocker
 Fix For: 2.0.0, 1.1.0, 1.0.2


 During normal operation the HRegion class reports a message related to 
 memstore flushing in HRegion.class :
   if (!canFlush) {
 addAndGetGlobalMemstoreSize(-memstoreSize.get());
   } else if (memstoreSize.get() != 0) {
 LOG.error(Memstore size is  + memstoreSize.get());
   }
 The log file is filled with lots of 
 Memstore size is 558744
 Memstore size is 4390632
 Memstore size is 558744 
 ...
 These message are uninformative, clog up the logs and offers no root cause 
 nor solution. Maybe the message needs to be more informative, changed to WARN 
 or some further information provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SPARK-6846) Stage kill URL easy to accidentally trigger and possibility for security issue.

2015-04-14 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14494012#comment-14494012
 ] 

Dev Lakhani commented on SPARK-6846:


[~srowen] please go ahead, I wont have time for this this week. 

 Stage kill URL easy to accidentally trigger and possibility for security 
 issue.
 ---

 Key: SPARK-6846
 URL: https://issues.apache.org/jira/browse/SPARK-6846
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.3.0
Reporter: Dev Lakhani
Priority: Minor

 On a similar note: When the kill link is cached in the browser bar, it's easy 
 to accidentally kill a job just by pressing enter. For example:
 You press the kill stage button and get the prompt whether you want to kill 
 the stage. You launch a new job and start typing:
 http://localhost:4040/
 Chrome for example starts auto completing with 
 http://localhost:4040/stages/kill/?id=0terminate=true 
 If you accidentally press enter it will kill the current stage without any 
 prompts.
 I think its also a bit of a security issue if from any host you can 
 curl/wget/issue: http://localhost:4040/stages/kill/?id=0terminate=true and 
 it will kill the current stage without prompting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (HBASE-13449) HBASE_REGIONSERVER_OPTS in local-regionservers.sh is masked and silently reset.

2015-04-11 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490855#comment-14490855
 ] 

Dev Lakhani commented on HBASE-13449:
-

Fine, but how to I set region server specific properties for JMX/JVM for 
example? As mentioned in my comment, the convention has been to set 
HBASE_REGIONSERVER_OPTS in hbase-env.sh and similarly for related hadoop 
properties to be in hadoop-env.sh. 

We then need another env variable to append to HBASE_REGIONSERVER_ARGS to 
ensure that the local region server has it's own set of runtime args. 

 HBASE_REGIONSERVER_OPTS in local-regionservers.sh is masked and silently 
 reset.
 ---

 Key: HBASE-13449
 URL: https://issues.apache.org/jira/browse/HBASE-13449
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dev Lakhani

 In local-regionserver.sh there is a line which masks the current 
 HBASE_REGIONSERVER_OPTS
 https://github.com/apache/hbase/blob/master/bin/local-regionservers.sh#L38
 # sanity check: make sure your regionserver opts don't use ports [i.e. 
 JMX/DBG]
 export HBASE_REGIONSERVER_OPTS= 
 We are having trouble with this because:
 1) As per normal Hadoop convention env variables should be in hbase-env.sh, 
 as with other daemons this is where we set jmx/jvm properties.
 2) Whatever we set in hbase-env.sh gets masked by this line.
 Is there any reason this line is included in the runner and can it be moved 
 to the env.sh file?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression

2015-02-08 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-5273:
---
Affects Version/s: (was: 1.2.0)

 Improve documentation examples for LinearRegression 
 

 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor

 In the document:
 https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html
 Under
 Linear least squares, Lasso, and ridge regression
 The suggested method to use LinearRegressionWithSGD.train()
 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 is not ideal even for simple examples such as y=x. This should be replaced 
 with more real world parameters with step size:
 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 or
 LinearRegressionWithSGD.train(input,100,0.0001)
 To create a reasonable MSE. It took me a while using the dev forum to learn 
 that the step size should be really small. Might help save someone the same 
 effort when learning mllib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-5273) Improve documentation examples for LinearRegression

2015-02-08 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani updated SPARK-5273:
---
Affects Version/s: 1.2.0

 Improve documentation examples for LinearRegression 
 

 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor

 In the document:
 https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html
 Under
 Linear least squares, Lasso, and ridge regression
 The suggested method to use LinearRegressionWithSGD.train()
 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 is not ideal even for simple examples such as y=x. This should be replaced 
 with more real world parameters with step size:
 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 or
 LinearRegressionWithSGD.train(input,100,0.0001)
 To create a reasonable MSE. It took me a while using the dev forum to learn 
 that the step size should be really small. Might help save someone the same 
 effort when learning mllib.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5273) Improve documentation examples for LinearRegression

2015-01-15 Thread Dev Lakhani (JIRA)

Dev Lakhani created SPARK-5273:
--

 Summary: Improve documentation examples for LinearRegression 
 Key: SPARK-5273
 URL: https://issues.apache.org/jira/browse/SPARK-5273
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Dev Lakhani
Priority: Minor


In the document:
https://spark.apache.org/docs/1.1.1/mllib-linear-methods.html

Under
Linear least squares, Lasso, and ridge regression

The suggested method to use LinearRegressionWithSGD.train()
// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

is not ideal even for simple examples such as y=x. This should be replaced with 
more real world parameters with step size:

val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.0001)
lr.optimizer.setNumIterations(100)

or

LinearRegressionWithSGD.train(input,100,0.0001)

To create a reasonable MSE. It took me a while using the dev forum to learn 
that the step size should be really small. Might help save someone the same 
effort when learning mllib.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-576) Design and develop a more precise progress estimator

2014-10-17 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175023#comment-14175023
 ] 

Dev Lakhani commented on SPARK-576:
---

I've created a PR for this: https://github.com/apache/spark/pull/2837/

 Design and develop a more precise progress estimator
 

 Key: SPARK-576
 URL: https://issues.apache.org/jira/browse/SPARK-576
 Project: Spark
  Issue Type: Improvement
Reporter: Mosharaf Chowdhury

 In addition to task_completed/total_tasks, we need to have something that 
 says estimated_time_remaining.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-10-16 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173619#comment-14173619
]

Dev Lakhani commented on SPARK-2321:

There are some issues and bugs under the webui component that are active.
Should we incorporate these into this Jira or is it best to work on them
separately and then merge these (2321) changes later?

https://issues.apache.org/jira/browse/SPARK/component/12322616

Design a proper progress reporting event listener API
---

Key: SPARK-2321
URL: https://issues.apache.org/jira/browse/SPARK-2321
Project: Spark
Issue Type: Improvement
Components: Java API, Spark Core
Affects Versions: 1.0.0
Reporter: Reynold Xin
Assignee: Josh Rosen
Priority: Critical

This is a ticket to track progress on redesigning the SparkListener and
JobProgressListener API.
There are multiple problems with the current design, including:
0. I'm not sure if the API is usable in Java (there are at least some enums
we used in Scala and a bunch of case classes that might complicate things).
1. The whole API is marked as DeveloperApi, because we haven't paid a lot of
attention to it yet. Something as important as progress reporting deserves a
more stable API.
2. There is no easy way to connect jobs with stages. Similarly, there is no
easy way to connect job groups with jobs / stages.
3. JobProgressListener itself has no encapsulation at all. States can be
arbitrarily mutated by external programs. Variable names are sort of randomly
decided and inconsistent.
We should just revisit these and propose a new, concrete design.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14173926#comment-14173926
]

Dev Lakhani commented on SPARK-3957:

Here is my thoughts on a possible approach.

Hi All

The broadcast occurs form the Spark Context to the broadcastmanager and new
Broadcast method. In the first instance, the broadcasted data is stored in the
Block Manager (see HttpBroadCast) of the executor. Any tracking of broadcast
variables must be referenced by the BlockManagerSlaveActor and
BlockManagerMasterActor. In particular UpdateBlockInfo and RemoveBroadcast
should update the total memory in blocks used when blocks are added and removed.

These can then be hooked up to the UI using a new Page like ExecutorsPage and
defining a new methods in the relevant listener such as StorageStatusListener.

These are my initial thoughts for someone new to these components, any other
ideas or approaches?

Broadcast variable memory usage not reflected in UI
---

Key: SPARK-3957
URL: https://issues.apache.org/jira/browse/SPARK-3957
Project: Spark
Issue Type: Bug
Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

Memory used by broadcast variables are not reflected in the memory usage
reported in the WebUI. For example, the executors tab shows memory used in
each executor but this number doesn't include memory used by broadcast
variables. Similarly the storage tab only shows list of rdds cached and how
much memory they use.
We should add a separate column / tab for broadcast variables to make it
easier to debug.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3957) Broadcast variable memory usage not reflected in UI

2014-10-16 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174107#comment-14174107
 ] 

Dev Lakhani commented on SPARK-3957:


Hi 

For now I am happy for [~CodingCat] to take this on, maybe once there are some 
commits I can help with the UI side, but for now I'll hold back.



 Broadcast variable memory usage not reflected in UI
 ---

 Key: SPARK-3957
 URL: https://issues.apache.org/jira/browse/SPARK-3957
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Web UI
Affects Versions: 1.0.2, 1.1.0
Reporter: Shivaram Venkataraman
Assignee: Nan Zhu

 Memory used by broadcast variables are not reflected in the memory usage 
 reported in the WebUI. For example, the executors tab shows memory used in 
 each executor but this number doesn't include memory used by broadcast 
 variables. Similarly the storage tab only shows list of rdds cached and how 
 much memory they use.  
 We should add a separate column / tab for broadcast variables to make it 
 easier to debug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2014-10-09 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165293#comment-14165293
 ] 

Dev Lakhani commented on SPARK-3644:


Hi I am doing some work on the REST/JSON aspects and will be happy to take this 
on. Can someone assign it to me and/or help me get started?

We need to first draft out the various endpoints and document them somewhere.

Thanks
Dev

 REST API for Spark application info (jobs / stages / tasks / storage info)
 --

 Key: SPARK-3644
 URL: https://issues.apache.org/jira/browse/SPARK-3644
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Web UI
Reporter: Josh Rosen

 This JIRA is a forum to draft a design proposal for a REST interface for 
 accessing information about Spark applications, such as job / stage / task / 
 storage status.
 There have been a number of proposals to serve JSON representations of the 
 information displayed in Spark's web UI.  Given that we might redesign the 
 pages of the web UI (and possibly re-implement the UI as a client of a REST 
 API), the API endpoints and their responses should be independent of what we 
 choose to display on particular web UI pages / layouts.
 Let's start a discussion of what a good REST API would look like from 
 first-principles.  We can discuss what urls / endpoints expose access to 
 data, how our JSON responses will be formatted, how fields will be named, how 
 the API will be documented and tested, etc.
 Some links for inspiration:
 https://developer.github.com/v3/
 http://developer.netflix.com/docs/REST_API_Reference
 https://helloreverb.com/developers/swagger



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (MAHOUT-1000) Implementation of Single Sample T-Test using Map Reduce/Mahout

2012-04-23 Thread Dev Lakhani (JIRA)

[
https://issues.apache.org/jira/browse/MAHOUT-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13259847#comment-13259847
]

Dev Lakhani commented on MAHOUT-1000:
-

I guess this was a naive attempt at trying to create a MR version of the Apache
commons math/statistics package. Following this implementation, the idea is to
go on to extend to ANOVAs, Wilcoxon Tests, Pearson correlations,
Kolmogrov-Smirnov and other R like features (but in MR).

Yup it could be done in Pig but it's maybe likely to need a UDF e.g. the TTest
in commons math defines the TDistribution for lookup of statistical values so
perhaps it's better doing the whole thing in Java. This also makes it easier to
test and control/tune the MR jobs.

I was just trying to test the waters really and see if there is support for
this; if so then there are plenty of basic stats tests than can be implemented
for big data. This will require a bit of help from the community. If not please
feel free to close this entry.

Cheers

Implementation of Single Sample T-Test using Map Reduce/Mahout
--

Key: MAHOUT-1000
URL: https://issues.apache.org/jira/browse/MAHOUT-1000
Project: Mahout
Issue Type: New Feature
Components: Math
Affects Versions: Backlog
Environment: Linux, Mac OS, Hadoop 0.20.2, Mahout 0.x
Reporter: Dev Lakhani
Labels: newbie
Fix For: Backlog

Original Estimate: 672h
Remaining Estimate: 672h

Implement a map/reduce version of the single sample t test to test whether a
sample of n subjects comes from a population in which the mean equals a
particular value.
For a large dataset, say n millions of rows, one can test whether the sample
(large as it is) comes from the population mean.
Input:
1) specified population mean to be tested against
2) hypothesis direction : i.e. two.sided, less, greater.
3) confidence level or alpha
4) flag to indicate paired or not paired
The procedure is as follows:
1. Use Map/Reduce to calculate the mean of the sample.
2. Use Map/Reduce to calculate standard error of the population mean.
3. Use Map/Reduce to calculate the t statistic
4. Estimate the degrees of freedom depending on equal sample variances
Output
1) The value of the t-statistic.
2) The p-value for the test.
3) Flag that is true if the null hypothesis can be rejected with confidence 1
- alpha; false otherwise.
References
http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

65 matches

Mail list logo