[jira] [Updated] (SPARK-10617) Leap year miscalculated in sql query

2015-10-06 Thread Alex Rovner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rovner updated SPARK-10617:

Affects Version/s: 1.5.1

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1, 1.6.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10617) Leap year miscalculated in sql query

2015-10-06 Thread Alex Rovner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rovner updated SPARK-10617:

Affects Version/s: 1.6.0

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10617) Leap year miscalculated in sql query

2015-10-06 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945985#comment-14945985
 ] 

Alex Rovner commented on SPARK-10617:
-

This issue is in the master as well.

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.6.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-10617) Leap year miscalculated in sql query

2015-10-05 Thread Alex Rovner (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rovner updated SPARK-10617:

Comment: was deleted

(was: https://github.com/apache/spark/pull/8980)

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10617) Leap year miscalculated in sql query

2015-10-05 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943519#comment-14943519
 ] 

Alex Rovner commented on SPARK-10617:
-

https://github.com/apache/spark/pull/8980

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10617) Leap year miscalculated in sql query

2015-10-05 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943300#comment-14943300
 ] 

Alex Rovner commented on SPARK-10617:
-

Good catch! Looks like there is a bug in DateTimeUtils.dateAddMonths I will 
check it out and submit a patch.

> Leap year miscalculated in sql query
> 
>
> Key: SPARK-10617
> URL: https://issues.apache.org/jira/browse/SPARK-10617
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: shao lo
>
> -- This is wrong...returns 2016-03-01
> select date_add(add_months(cast('2015-02-28' as date), 1 * 12), 1)
> -- This is right...returns 2016-02-29
> select date_add(cast('2016-02-28' as date), 1)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10896) Parquet join issue

2015-10-04 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942892#comment-14942892
 ] 

Alex Rovner commented on SPARK-10896:
-

I do not have an HDP cluster handy, so instead I am running on CDH 5.4 and I am 
unable to reproduce. I tried both with and without hadoop bundles and tried 
running both in local and YARN modes.

I always get the following as the last line in the output if I copy paste your 
commands:
{noformat}
res3: Array[org.apache.spark.sql.Row] = Array([0,0,0,0], [1,1,1,1], [2,2,2,2], 
[3,3,3,3], [4,4,4,4], [5,5,5,5], [6,6,6,6], [7,7,7,7])
{noformat}

> Parquet join issue
> --
>
> Key: SPARK-10896
> URL: https://issues.apache.org/jira/browse/SPARK-10896
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0, 1.5.1
> Environment: spark-1.5.0-bin-hadoop2.6.tgz with HDP 2.3
>Reporter: Tamas Szuromi
>  Labels: dataframe, hdfs, join, parquet, sql
>
> After loading parquet files join is not working.
> How to reproduce:
> {code:java}
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> val arr1 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema1 = StructType(
>   StructField("id", IntegerType) ::
>   StructField("value1", IntegerType) :: Nil)
> val df1 = sqlContext.createDataFrame(sc.parallelize(arr1), schema1)
> val arr2 = Array[Row](Row.apply(0, 0), Row.apply(1,1), Row.apply(2,2), 
> Row.apply(3, 3), Row.apply(4, 4), Row.apply(5, 5), Row.apply(6, 6), 
> Row.apply(7, 7))
> val schema2 = StructType(
>   StructField("otherId", IntegerType) ::
>   StructField("value2", IntegerType) :: Nil)
> val df2 = sqlContext.createDataFrame(sc.parallelize(arr2), schema2)
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> df1.write.format("parquet").save("hdfs:///tmp/df1")
> df2.write.format("parquet").save("hdfs:///tmp/df2")
> val df1=sqlContext.read.parquet("hdfs:///tmp/df1/*.parquet")
> val df2=sqlContext.read.parquet("hdfs:///tmp/df2/*.parquet")
> val res = df1.join(df2, df1("id")===df2("otherId"))
> df1.take(10)
> df2.take(10)
> res.count()
> res.take(10)
> {code}
> Output
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 8 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,0], [1,1,1,1], [2,2,2,2], 
> [3,3,3,3], [4,4,4,4], [5,5,5,5], [6,6,6,6], [7,7,7,7]) 
> {code}
> After reading back:
> {code:java}
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Array[org.apache.spark.sql.Row] = Array([0,0], [1,1], [2,2], [3,3], [4,4], 
> [5,5], [6,6], [7,7]) 
> Long = 4 
> Array[org.apache.spark.sql.Row] = Array([0,0,0,5], [2,2,2,null], [4,4,4,5], 
> [6,6,6,null])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791150#comment-14791150
 ] 

Alex Rovner commented on SPARK-3978:


[~barge.nilesh] What version of Spark have you tested with?

> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-05 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732033#comment-14732033
 ] 

Alex Rovner commented on SPARK-3978:


I have verified that altering a table stored as parquet files and then querying 
the table through spark-sql shell works as expected in master.

> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3231) select on a table in parquet format containing smallint as a field type does not work

2015-09-05 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732021#comment-14732021
 ] 

Alex Rovner edited comment on SPARK-3231 at 9/5/15 4:03 PM:


This is no longer an issue in master. I just verified that it works correctly. 
If you can upgrade to a later version of Spark and try this operation again, it 
would be helpful to know in what version this was fixed in case someone is 
interested in backporting the fixes.


was (Author: arov):
This is no longer an issue in master. I just verified that it works correctly.

> select on a table in parquet format containing smallint as a field type does 
> not work
> -
>
> Key: SPARK-3231
> URL: https://issues.apache.org/jira/browse/SPARK-3231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: The table is created through Hive-0.13.
> SparkSql 1.1 is used.
>Reporter: chirag aggarwal
>
> A table is created through hive. This table has a field of type smallint. The 
> format of the table is parquet.
> select on this table works perfectly on hive shell.
> But, when the select is run on this table from spark-sql, then the query 
> fails.
> Steps to reproduce the issue:
> --
> hive> create table abct (a smallint, b int) row format delimited fields 
> terminated by '|' stored as textfile;
> A text file is stored in hdfs for this table.
> hive> create table abc (a smallint, b int) stored as parquet; 
> hive> insert overwrite table abc select * from abct;
> hive> select * from abc;
> 2 1
> 2 2
> 2 3
> spark-sql> select * from abc;
> 10:08:46 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0 in stage 33.0 (TID 2340) had a not serializable 
> result: org.apache.hadoop.io.IntWritable
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1158)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1147)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1146)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1146)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> But, if the type of this table is now changed to int, then spark-sql gives 
> the correct results.
> hive> alter table abc change a a int;
> spark-sql> select * from abc;
> 2 1
> 2 2
> 2 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3231) select on a table in parquet format containing smallint as a field type does not work

2015-09-05 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732021#comment-14732021
 ] 

Alex Rovner commented on SPARK-3231:


This is no longer an issue in master. I just verified that it works correctly.

> select on a table in parquet format containing smallint as a field type does 
> not work
> -
>
> Key: SPARK-3231
> URL: https://issues.apache.org/jira/browse/SPARK-3231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: The table is created through Hive-0.13.
> SparkSql 1.1 is used.
>Reporter: chirag aggarwal
>
> A table is created through hive. This table has a field of type smallint. The 
> format of the table is parquet.
> select on this table works perfectly on hive shell.
> But, when the select is run on this table from spark-sql, then the query 
> fails.
> Steps to reproduce the issue:
> --
> hive> create table abct (a smallint, b int) row format delimited fields 
> terminated by '|' stored as textfile;
> A text file is stored in hdfs for this table.
> hive> create table abc (a smallint, b int) stored as parquet; 
> hive> insert overwrite table abc select * from abct;
> hive> select * from abc;
> 2 1
> 2 2
> 2 3
> spark-sql> select * from abc;
> 10:08:46 ERROR CliDriver: org.apache.spark.SparkException: Job aborted due to 
> stage failure: Task 0.0 in stage 33.0 (TID 2340) had a not serializable 
> result: org.apache.hadoop.io.IntWritable
>   at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1158)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1147)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1146)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1146)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:685)
>   at scala.Option.foreach(Option.scala:236)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:685)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1364)
>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> But, if the type of this table is now changed to int, then spark-sql gives 
> the correct results.
> hive> alter table abc change a a int;
> spark-sql> select * from abc;
> 2 1
> 2 2
> 2 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10397) Make Python's SparkContext self-descriptive on "print sc"

2015-09-04 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14731619#comment-14731619
 ] 

Alex Rovner commented on SPARK-10397:
-

Pull: https://github.com/apache/spark/pull/8608

{noformat}
>>> sc
{'_accumulatorServer': ,
 '_batchSize': 0,
 '_callsite': CallSite(function='', 
file='/Users/alex.rovner/git/spark/python/pyspark/shell.py', linenum=43),
 '_conf': {'_jconf': JavaObject id=o0},
 '_javaAccumulator': JavaObject id=o11,
 '_jsc': JavaObject id=o8,
 '_pickled_broadcast_vars': set([]),
 '_python_includes': [],
 '_temp_dir': 
u'/private/var/folders/hj/v4zb0_f159q8mt4w3j8m2_mrgp/T/spark-a9cc47a9-db90-49a3-a82e-263f0b56268c/pyspark-773c7490-2b2d-4418-a030-256a5b9c1fe1',
 '_unbatched_serializer': PickleSerializer(),
 'appName': u'PySparkShell',
 'environment': {},
 'master': u'local[*]',
 'profiler_collector': None,
 'pythonExec': 'python2.7',
 'pythonVer': '2.7',
 'serializer': AutoBatchedSerializer(PickleSerializer()),
 'sparkHome': None}
>>> print sc
{'_accumulatorServer': ,
 '_batchSize': 0,
 '_callsite': CallSite(function='', 
file='/Users/alex.rovner/git/spark/python/pyspark/shell.py', linenum=43),
 '_conf': {'_jconf': JavaObject id=o0},
 '_javaAccumulator': JavaObject id=o11,
 '_jsc': JavaObject id=o8,
 '_pickled_broadcast_vars': set([]),
 '_python_includes': [],
 '_temp_dir': 
u'/private/var/folders/hj/v4zb0_f159q8mt4w3j8m2_mrgp/T/spark-a9cc47a9-db90-49a3-a82e-263f0b56268c/pyspark-773c7490-2b2d-4418-a030-256a5b9c1fe1',
 '_unbatched_serializer': PickleSerializer(),
 'appName': u'PySparkShell',
 'environment': {},
 'master': u'local[*]',
 'profiler_collector': None,
 'pythonExec': 'python2.7',
 'pythonVer': '2.7',
 'serializer': AutoBatchedSerializer(PickleSerializer()),
 'sparkHome': None}
>>> 

{noformat}

> Make Python's SparkContext self-descriptive on "print sc"
> -
>
> Key: SPARK-10397
> URL: https://issues.apache.org/jira/browse/SPARK-10397
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.4.0
>Reporter: Sergey Tryuber
>Priority: Trivial
>
> When I execute in Python shell:
> {code}
> print sc
> {code}
> I receive something like:
> {noformat}
> 
> {noformat}
> But this is very inconvenient, especially if a user wants to create a 
> good-looking and self-descriptive IPython Notebook. He would like to see some 
> information about his Spark cluster.
> In contrast, H2O context does have this feature and it is very helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10375) Setting the driver memory with SparkConf().set("spark.driver.memory","1g") does not work

2015-09-01 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725909#comment-14725909
 ] 

Alex Rovner commented on SPARK-10375:
-

[~srowen] Shall we re-open?

> Setting the driver memory with SparkConf().set("spark.driver.memory","1g") 
> does not work
> 
>
> Key: SPARK-10375
> URL: https://issues.apache.org/jira/browse/SPARK-10375
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: Running with yarn
>Reporter: Thomas
>Priority: Minor
>
> When running pyspark 1.3.0 with yarn, the following code has no effect:
> pyspark.SparkConf().set("spark.driver.memory","1g")
> The Environment tab in yarn shows that the driver has 1g, however, the 
> Executors tab only shows 512 M (the default value) for the driver memory.  
> This issue goes away when the driver memory is specified via the command line 
> (i.e. --driver-memory 1g)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10375) Setting the driver memory with SparkConf().set("spark.driver.memory","1g") does not work

2015-09-01 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725473#comment-14725473
 ] 

Alex Rovner commented on SPARK-10375:
-

May I suggest throwing an exception when certain properties are set that will 
not take effect? (spark.driver.*)

> Setting the driver memory with SparkConf().set("spark.driver.memory","1g") 
> does not work
> 
>
> Key: SPARK-10375
> URL: https://issues.apache.org/jira/browse/SPARK-10375
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.3.0
> Environment: Running with yarn
>Reporter: Thomas
>Priority: Minor
>
> When running pyspark 1.3.0 with yarn, the following code has no effect:
> pyspark.SparkConf().set("spark.driver.memory","1g")
> The Environment tab in yarn shows that the driver has 1g, however, the 
> Executors tab only shows 512 M (the default value) for the driver memory.  
> This issue goes away when the driver memory is specified via the command line 
> (i.e. --driver-memory 1g)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5480) GraphX pageRank: java.lang.ArrayIndexOutOfBoundsException:

2015-05-08 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535113#comment-14535113
 ] 

Alex Rovner commented on SPARK-5480:


We are running Spark 1.3.0 on Hadoop 2.6.0-cdh5.4.0 and are facing this 
exception on every execution. I need to work through my channels to see if I 
can share the code and data that causes this exception.

> GraphX pageRank: java.lang.ArrayIndexOutOfBoundsException: 
> ---
>
> Key: SPARK-5480
> URL: https://issues.apache.org/jira/browse/SPARK-5480
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 1.2.0
> Environment: Yarn client
>Reporter: Stephane Maarek
>
> Running the following code:
> val subgraph = graph.subgraph (
>   vpred = (id,article) => //working predicate)
> ).cache()
> println( s"Subgraph contains ${subgraph.vertices.count} nodes and 
> ${subgraph.edges.count} edges")
> val prGraph = subgraph.staticPageRank(5).cache
> val titleAndPrGraph = subgraph.outerJoinVertices(prGraph.vertices) {
>   (v, title, rank) => (rank.getOrElse(0.0), title)
> }
> titleAndPrGraph.vertices.top(13) {
>   Ordering.by((entry: (VertexId, (Double, _))) => entry._2._1)
> }.foreach(t => println(t._2._2._1 + ": " + t._2._1 + ", id:" + t._1))
> Returns a graph with 5000 nodes and 4000 edges.
> Then it crashes during the PageRank with the following:
> 15/01/29 05:51:07 INFO scheduler.TaskSetManager: Starting task 125.0 in stage 
> 39.0 (TID 1808, *HIDDEN, PROCESS_LOCAL, 2059 bytes)
> 15/01/29 05:51:07 WARN scheduler.TaskSetManager: Lost task 107.0 in stage 
> 39.0 (TID 1794, *HIDDEN): java.lang.ArrayIndexOutOfBoundsException: -1
> at 
> org.apache.spark.graphx.util.collection.GraphXPrimitiveKeyOpenHashMap$mcJI$sp.apply$mcJI$sp(GraphXPrimitiveKeyOpenHashMap.scala:64)
> at 
> org.apache.spark.graphx.impl.EdgePartition.updateVertices(EdgePartition.scala:91)
> at 
> org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:75)
> at 
> org.apache.spark.graphx.impl.ReplicatedVertexView$$anonfun$2$$anonfun$apply$1.apply(ReplicatedVertexView.scala:73)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at 
> org.apache.spark.graphx.impl.EdgeRDDImpl$$anonfun$mapEdgePartitions$1.apply(EdgeRDDImpl.scala:110)
> at 
> org.apache.spark.graphx.impl.EdgeRDDImpl$$anonfun$mapEdgePartitions$1.apply(EdgeRDDImpl.scala:108)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:61)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> at 
> org.apache.spark.scheduler.ShuffleMapTa

[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-28 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517499#comment-14517499
 ] 

Alex Rovner commented on SPARK-5529:


Sorry about all the pull requests. Here is one rebased against the right branch 
and without any compilation issues: https://github.com/apache/spark/pull/5747

> BlockManager heartbeat expiration does not kill executor
> 
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Hong Shen
>Assignee: Hong Shen
> Fix For: 1.4.0
>
> Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 12ms
> 
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-28 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517298#comment-14517298
 ] 

Alex Rovner commented on SPARK-5529:


Sorry to quickly pulled the trigger... Need to resolve some compilation errors 

> BlockManager heartbeat expiration does not kill executor
> 
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Hong Shen
>Assignee: Hong Shen
> Fix For: 1.4.0
>
> Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 12ms
> 
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-28 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517281#comment-14517281
 ] 

Alex Rovner commented on SPARK-5529:


Applied patch to 1.3: https://github.com/apache/spark/pull/5745

> BlockManager heartbeat expiration does not kill executor
> 
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Hong Shen
>Assignee: Hong Shen
> Fix For: 1.4.0
>
> Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 12ms
> 
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-28 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14517257#comment-14517257
 ] 

Alex Rovner commented on SPARK-5529:


CDH is usually somewhat slow on picking up the latest changes though. Would it 
be possible to backport this fix into 1.3?

> BlockManager heartbeat expiration does not kill executor
> 
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Hong Shen
>Assignee: Hong Shen
> Fix For: 1.4.0
>
> Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 12ms
> 
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-24 Thread Alex Rovner (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511374#comment-14511374
 ] 

Alex Rovner commented on SPARK-5529:


We are facing this issue on cdh5.3.2 spark 1.2.0-SNAPSHOT

Is there any workaround except upgrading to 1.4 version of spark?

> BlockManager heartbeat expiration does not kill executor
> 
>
> Key: SPARK-5529
> URL: https://issues.apache.org/jira/browse/SPARK-5529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 1.2.0
>Reporter: Hong Shen
>Assignee: Hong Shen
> Fix For: 1.4.0
>
> Attachments: SPARK-5529.patch
>
>
> When I run a spark job, one executor is hold, after 120s, blockManager is 
> removed by driver, but after half an hour before the executor is remove by  
> driver. Here is the log:
> {code}
> 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
> BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
> exceeds 12ms
> 
> 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
> 10.215.143.14: remote Akka client disassociated
> 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
> system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
> now gated for [5000] ms. Reason is: [Disassociated].
> 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
> 0.0
> 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
> 10.215.143.14): ExecutorLostFailure (executor 1 lost)
> 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
> non-existent executor 1
> 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
> 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
> from BlockManagerMaster.
> 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
> removeExecutor
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org