[jira] [Commented] (SPARK-18893) Not support "alter table .. add columns .."

2017-01-16 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824848#comment-15824848
 ] 

Hong Shen commented on SPARK-18893:
---

+1, "alter table add/replace columns" is a very important feature. [~yhuai] 
[~rxin]

> Not support "alter table .. add columns .." 
> 
>
> Key: SPARK-18893
> URL: https://issues.apache.org/jira/browse/SPARK-18893
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: zuotingbing
>
> when we update spark from version 1.5.2 to 2.0.1, all cases we have need 
> change the table use "alter table add columns " failed, but it is said "All 
> Hive DDL Functions, including: alter table" in the official document : 
> http://spark.apache.org/docs/latest/sql-programming-guide.html.
> Is there any plan to support  sql "alter table .. add/replace columns" ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Description: 
When I create a UDF that jar file in hdfs, I can't use the UDF. 
{code}
spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7
{code}

The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"
{code}
  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  {color:red}uri.toURL{color}
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }
{code}

I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:
{code}
static {
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
{code}


  was:
When I create a UDF that jar file in hdfs, I can't use the UDF. 
{code}
spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7
{code}

The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"
{code}
  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  {color:red}uri.toURL{color}
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }
{code}

I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:
{code}
static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
{code}



> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> {code}
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> {code}
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> {code}
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL

[jira] [Updated] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Description: 
When I create a UDF that jar file in hdfs, I can't use the UDF. 
{code}
spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7
{code}

The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"
{code}
  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  {color:red}uri.toURL{color}
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }
{code}

I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:
{code}
static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}
{code}


  was:
When I create a UDF that jar file in hdfs, I can't use the UDF. 

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7


The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"

  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  {color:red}uri.toURL{color}
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }


I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:

static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}




> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> {code}
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> {code}
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> {code}
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>

[jira] [Updated] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Description: 
When I create a UDF that jar file in hdfs, I can't use the UDF. 

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7


The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"

  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  {color:red}uri.toURL{color}
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }


I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:

static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}



  was:
When I create a UDF that jar file in hdfs, I can't use the UDF. 

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7


The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"

  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  uri.toURL
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }


I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:

static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}




> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> 
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL 

[jira] [Commented] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756385#comment-15756385
 ] 

Hong Shen commented on SPARK-18910:
---

Should I add a pull request to resolve this problem?

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> 
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   uri.toURL
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> 
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> 
> static {
>   // This method can be called at most once in a given JVM.
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Description: 
When I create a UDF that jar file in hdfs, I can't use the UDF. 

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7


The reason is when 
org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource, 
the uri.toURL throw exception with " failed unknown protocol: hdfs"

  def addJar(path: String): Unit = {
sparkSession.sparkContext.addJar(path)

val uri = new Path(path).toUri
val jarURL = if (uri.getScheme == null) {
  // `path` is a local file path without a URL scheme
  new File(path).toURI.toURL
} else {
  // `path` is a URL with a scheme
  uri.toURL
}
jarClassLoader.addURL(jarURL)
Thread.currentThread().setContextClassLoader(jarClassLoader)
  }


I think we should setURLStreamHandlerFactory method on URL with an instance of 
FsUrlStreamHandlerFactory, just like:

static {
// This method can be called at most once in a given JVM.
URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
}



  was:
When I create a UDF that jar

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7




> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar file in hdfs, I can't use the UDF. 
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 
> The reason is when 
> org.apache.spark.sql.internal.SessionState.FunctionResourceLoader.loadResource,
>  the uri.toURL throw exception with " failed unknown protocol: hdfs"
> 
>   def addJar(path: String): Unit = {
> sparkSession.sparkContext.addJar(path)
> val uri = new Path(path).toUri
> val jarURL = if (uri.getScheme == null) {
>   // `path` is a local file path without a URL scheme
>   new File(path).toURI.toURL
> } else {
>   // `path` is a URL with a scheme
>   uri.toURL
> }
> jarClassLoader.addURL(jarURL)
> Thread.currentThread().setContextClassLoader(jarClassLoader)
>   }
> 
> I think we should setURLStreamHandlerFactory method on URL with an instance 
> of FsUrlStreamHandlerFactory, just like:
> 
> static {
>   // This method can be called at most once in a given JVM.
>   URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
> }
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18910) Can't use UDF that jar file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Summary: Can't use UDF that jar file in hdfs  (was: Can't use UDF that 
source file in hdfs)

> Can't use UDF that jar file in hdfs
> ---
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18910) Can't use UDF that source file in hdfs

2016-12-16 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-18910:
--
Description: 
When I create a UDF that jar

spark-sql> create function trans_array as 'com.test.udf.TransArray'  using jar 
'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';

spark-sql> describe function trans_array;
Function: test_db.trans_array
Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
Usage: N/A.
Time taken: 0.127 seconds, Fetched 3 row(s)

spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) from 
test_spark limit 10;
Error in query: Undefined function: 'trans_array'. This function is neither a 
registered temporary function nor a permanent function registered in the 
database 'test_db'.; line 1 pos 7



> Can't use UDF that source file in hdfs
> --
>
> Key: SPARK-18910
> URL: https://issues.apache.org/jira/browse/SPARK-18910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Hong Shen
>
> When I create a UDF that jar
> 
> spark-sql> create function trans_array as 'com.test.udf.TransArray'  using 
> jar 
> 'hdfs://host1:9000/spark/dev/share/libs/spark-proxy-server-biz-service-impl-1.0.0.jar';
> spark-sql> describe function trans_array;
> Function: test_db.trans_array
> Class: com.alipay.spark.proxy.server.biz.service.impl.udf.TransArray
> Usage: N/A.
> Time taken: 0.127 seconds, Fetched 3 row(s)
> spark-sql> select trans_array(1, '\\|', id, position) as (id0, position0) 
> from test_spark limit 10;
> Error in query: Undefined function: 'trans_array'. This function is neither a 
> registered temporary function nor a permanent function registered in the 
> database 'test_db'.; line 1 pos 7
> 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18910) Can't use UDF that source file in hdfs

2016-12-16 Thread Hong Shen (JIRA)
Hong Shen created SPARK-18910:
-

 Summary: Can't use UDF that source file in hdfs
 Key: SPARK-18910
 URL: https://issues.apache.org/jira/browse/SPARK-18910
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.2
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16989) fileSystem.getFileStatus throw exception in EventLoggingListener

2016-08-10 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416320#comment-15416320
 ] 

Hong Shen commented on SPARK-16989:
---

Thanks for your comment.
In our cluster, the log directory is a unified config. the clusters have runned 
for a long time, and the cluster often add and remove machine, we don't want to 
restart the clusters for adding a log directory, so we want to add a config to 
create the log directory.

> fileSystem.getFileStatus throw exception in EventLoggingListener
> 
>
> Key: SPARK-16989
> URL: https://issues.apache.org/jira/browse/SPARK-16989
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Hong Shen
>Priority: Minor
>
> If log directory does not exist, it will throw exception as the follow log.
> {code}
> 16/05/02 22:24:22 ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File 
> file:/data/tdwadmin/tdwenv/tdwgaia/logs/sparkhistory/intermediate-done-dir 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:109)
>   at org.apache.spark.SparkContext.(SparkContext.scala:612)
>   at 
> org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:78)
>   at 
> org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:46)
>   at 
> org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:311)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:69)
>   at 
> org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:310)
>   at 
> org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)
> {code}
> {code}
> if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDirectory) {
>   throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
> exist.")
> }
> {code}
> There are two problems.
> 1 The judgment should add if fileSystem.exists(new Path(logBaseDir)), to 
> prevent throw exception at getFileStatus.
> 2 I think we should add a choose let the applicaitonmaster to create log 
> directory. Like this:
> {code}
> if (!fileSystem.exists(lp) &&
>   sparkConf.getBoolean("spark.eventLog.create.if.baseDir.not.exist", 
> true)) {
>   fileSystem.mkdirs(lp)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16989) fileSystem.getFileStatus throw exception in EventLoggingListener

2016-08-10 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16989:
--
Description: 
If log directory does not exist, it will throw exception as the follow log.
{code}
16/05/02 22:24:22 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File 
file:/data/tdwadmin/tdwenv/tdwgaia/logs/sparkhistory/intermediate-done-dir does 
not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:109)
at org.apache.spark.SparkContext.(SparkContext.scala:612)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:78)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:46)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:311)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:69)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:310)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)
{code}

{code}
if (!fileSystem.getFileStatus(new Path(logBaseDir)).isDirectory) {
  throw new IllegalArgumentException(s"Log directory $logBaseDir does not 
exist.")
}
{code}

There are two problems.
1 The judgment should add if fileSystem.exists(new Path(logBaseDir)), to 
prevent throw exception at getFileStatus.
2 I think we should add a choose let the applicaitonmaster to create log 
directory. Like this:
{code}
if (!fileSystem.exists(lp) &&
  sparkConf.getBoolean("spark.eventLog.create.if.baseDir.not.exist", true)) 
{
  fileSystem.mkdirs(lp)
}
{code}

  was:
16/05/02 22:24:22 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File 
file:/data/tdwadmin/tdwenv/tdwgaia/logs/sparkhistory/intermediate-done-dir does 
not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:109)
at org.apache.spark.SparkContext.(SparkContext.scala:612)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:78)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:46)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:311)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:69)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:310)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)



> fileSystem.getFileStatus throw exception in EventLoggingListener
> 
>
> Key: SPARK-16989
> URL: https://issues.apache.org/jira/browse/SPARK-16989
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Hong Shen
>
> If log directory does not exist, it will throw exception as the follow log.
> {code}
> 16/05/02 22:24:22 ERROR spark.SparkContext: Error initializing SparkContext.
> java.io.FileNotFoundException: File 
> file:/data/tdwadmin/tdwenv/tdwgaia/logs/sparkhistory/intermediate-done-dir 
> does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
>   at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
>   at 
> org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:109)
>   at 

[jira] [Created] (SPARK-16989) fileSystem.getFileStatus throw exception in EventLoggingListener

2016-08-10 Thread Hong Shen (JIRA)
Hong Shen created SPARK-16989:
-

 Summary: fileSystem.getFileStatus throw exception in 
EventLoggingListener
 Key: SPARK-16989
 URL: https://issues.apache.org/jira/browse/SPARK-16989
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Hong Shen


16/05/02 22:24:22 ERROR spark.SparkContext: Error initializing SparkContext.
java.io.FileNotFoundException: File 
file:/data/tdwadmin/tdwenv/tdwgaia/logs/sparkhistory/intermediate-done-dir does 
not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:109)
at org.apache.spark.SparkContext.(SparkContext.scala:612)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:78)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.(SQLApplicationMaster.scala:46)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$$anonfun$main$1.apply$mcV$sp(SQLApplicationMaster.scala:311)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:69)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:69)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster$.main(SQLApplicationMaster.scala:310)
at 
org.apache.spark.deploy.yarn.SQLApplicationMaster.main(SQLApplicationMaster.scala)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16985) SQL Output maybe overrided

2016-08-09 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15414644#comment-15414644
 ] 

Hong Shen commented on SPARK-16985:
---

The reason is the output file use SimpleDateFormat("MMddHHmm"), if two sql 
insert into the same table in the same minute, the output will be overrite. I 
think we should change dateFormat to "MMddHHmmss", in our cluster, we can't 
finished a sql in one second.

> SQL Output maybe overrided
> --
>
> Key: SPARK-16985
> URL: https://issues.apache.org/jira/browse/SPARK-16985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hong Shen
>
> In our cluster, sometimes the sql output maybe overrided. When I submit some 
> sql, all insert into the same table, and the sql will cost less one minute, 
> here is the detail,
> 1 sql1, 11:03 insert into table.
> 2 sql2, 11:04:11 insert into table.
> 3 sql3, 11:04:48 insert into table.
> 4 sql4, 11:05 insert into table.
> 5 sql5, 11:06 insert into table.
> The sql3's output file will override the sql2's output file. here is the log:
> {code}
> 16/05/04 11:04:11 INFO hive.SparkHiveHadoopWriter: 
> XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_1204544348/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1
> 16/05/04 11:04:48 INFO hive.SparkHiveHadoopWriter: 
> XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_212180468/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16985) SQL Output maybe overrided

2016-08-09 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16985:
--
Description: 
In our cluster, sometimes the sql output maybe overrided. When I submit some 
sql, all insert into the same table, and the sql will cost less one minute, 
here is the detail,

1 sql1, 11:03 insert into table.
2 sql2, 11:04:11 insert into table.
3 sql3, 11:04:48 insert into table.
4 sql4, 11:05 insert into table.
5 sql5, 11:06 insert into table.
The sql3's output file will override the sql2's output file. here is the log:
{code}
16/05/04 11:04:11 INFO hive.SparkHiveHadoopWriter: 
XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_1204544348/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1

16/05/04 11:04:48 INFO hive.SparkHiveHadoopWriter: 
XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_212180468/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1
{code}

> SQL Output maybe overrided
> --
>
> Key: SPARK-16985
> URL: https://issues.apache.org/jira/browse/SPARK-16985
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hong Shen
>
> In our cluster, sometimes the sql output maybe overrided. When I submit some 
> sql, all insert into the same table, and the sql will cost less one minute, 
> here is the detail,
> 1 sql1, 11:03 insert into table.
> 2 sql2, 11:04:11 insert into table.
> 3 sql3, 11:04:48 insert into table.
> 4 sql4, 11:05 insert into table.
> 5 sql5, 11:06 insert into table.
> The sql3's output file will override the sql2's output file. here is the log:
> {code}
> 16/05/04 11:04:11 INFO hive.SparkHiveHadoopWriter: 
> XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_1204544348/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1
> 16/05/04 11:04:48 INFO hive.SparkHiveHadoopWriter: 
> XXfinalPath=hdfs://tl-sng-gdt-nn-tdw.tencent-distribute.com:54310/tmp/assorz/tdw-tdwadmin/20160504/04559505496526517_-1_212180468/1/_tmp.p_20160428/attempt_201605041104_0001_m_00_1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16985) SQL Output maybe overrided

2016-08-09 Thread Hong Shen (JIRA)
Hong Shen created SPARK-16985:
-

 Summary: SQL Output maybe overrided
 Key: SPARK-16985
 URL: https://issues.apache.org/jira/browse/SPARK-16985
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-08-08 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411620#comment-15411620
 ] 

Hong Shen commented on SPARK-16709:
---

If you agree it's a bug, then I will reopen the jira and add a patch to fix it.

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-08-08 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411616#comment-15411616
 ] 

Hong Shen commented on SPARK-16709:
---

Sorry for late, this is different from SPARK-14915, in the fact, we have 
already change the code that SPARK-14915 changed.
{code}
sched.dagScheduler.taskEnded(tasks(index), reason, null, null, info, 
taskMetrics)
// If speculative enable, when one task succeed, the other task with state 
RUNNING will be killed.
// The killed task will statusUpdate with state KILLED/FAILED.
// In this case, the task should not be re-add.
if (!successful(index)) {
  addPendingTask(index)
}
if (!isZombie && state != TaskState.KILLED
{code}

Here is the reason, 
the log is
{code}
16/07/28 05:22:15 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 
(TID 175, 10.215.146.81, partition 1,PROCESS_LOCAL, 1930 bytes)

16/07/28 05:28:35 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.1 
(TID 207, 10.196.147.232, partition 1,PROCESS_LOCAL, 1930 bytes)

16/07/28 05:28:48 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 
(TID 175) in 393261 ms on 10.215.146.81 (3/50)

16/07/28 05:34:11 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.1 
(TID 207, 10.196.147.232): TaskCommitDenied (Driver denied task commit) for 
job: 1, partition: 1, attemptNumber: 207
{code}
1 task 1.0 in stage1.0 start
2 stage1.0 failed, start stage1.1.
3 task 1.0 in stage1.1 start
4 task 1.0 in stage1.0 finished.
5 task 1.0 in stage1.1 failed with TaskCommitDenied Exception, then retry 
forever.


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393062#comment-15393062
 ] 

Hong Shen edited comment on SPARK-16709 at 7/26/16 2:10 AM:


It different, the task has no attempt succeed. This happen when a task attempt 
failed when performCommit(), the following attempts can't commit any more, 
because just one task can performCommit(), even though the first attempt failed 
at performCommit().



was (Author: shenhong):
It different, the task has no attempt succeed. this happen when a task attempt 
failed when performCommit(), the following attempts can commit any more, 
because just one task can performCommit(), even though the first attempt failed 
at performCommit().


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393062#comment-15393062
 ] 

Hong Shen commented on SPARK-16709:
---

It different, the task has no attempt succeed. this happen when a task attempt 
failed when performCommit(), the following attempts can commit any more, 
because just one task can performCommit(), even though the first attempt failed 
at performCommit().


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16708) ExecutorAllocationManager.numRunningTasks can be negative when stage retry

2016-07-25 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15393054#comment-15393054
 ] 

Hong Shen commented on SPARK-16708:
---

It's different,ExecutorAllocationManager.numRunningTasks be negative can cause 
ExecutorAllocationManager.maxNumExecutorsNeeded be negative, sometimes it can 
lead to many task pending to running, but ExecutorAllocationManager not 
allocate executors.

> ExecutorAllocationManager.numRunningTasks can be negative when stage retry
> --
>
> Key: SPARK-16708
> URL: https://issues.apache.org/jira/browse/SPARK-16708
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> When a task fetch failed, the stage will complete and retry, when the stage 
> complete, ExecutorAllocationManager.numRunningTasks will be set 0, here is 
> the code:
> {code}
> override def onStageCompleted(stageCompleted: 
> SparkListenerStageCompleted): Unit = {
>   val stageId = stageCompleted.stageInfo.stageId
>   allocationManager.synchronized {
> stageIdToNumTasks -= stageId
> stageIdToTaskIndices -= stageId
> stageIdToExecutorPlacementHints -= stageId
> // Update the executor placement hints
> updateExecutorPlacementHints()
> // If this is the last stage with pending tasks, mark the scheduler 
> queue as empty
> // This is needed in case the stage is aborted for any reason
> if (stageIdToNumTasks.isEmpty) {
>   allocationManager.onSchedulerQueueEmpty()
>   if (numRunningTasks != 0) {
> logWarning("No stages are running, but numRunningTasks != 0")
> numRunningTasks = 0
>   }
> }
>   }
> }
> {code}
> But  when the stage's running task finished, numRunningTasks will minus 1, so 
> numRunningTasks be negative, it can cause maxNeeded be negative.
> {code}
> override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
>   val executorId = taskEnd.taskInfo.executorId
>   val taskId = taskEnd.taskInfo.taskId
>   val taskIndex = taskEnd.taskInfo.index
>   val stageId = taskEnd.stageId
>   allocationManager.synchronized {
> numRunningTasks -= 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15392987#comment-15392987
 ] 

Hong Shen commented on SPARK-16707:
---

It happened many times in our cluster(with thousand of machines), but it's too 
hard to reproduce this, I can just retry the way stackoverflow.com describe.

> TransportClientFactory.createClient may throw NPE
> -
>
> Key: SPARK-16707
> URL: https://issues.apache.org/jira/browse/SPARK-16707
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> I have encounter  some NullPointerException when 
> TransportClientFactory.createClient in my cluster, here is the following 
> stack trace.
> {code}
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
>   ... 32 more
> {code}
> The code is at 
> 

[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Attachment: commit failed.png

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Attachment: (was: `N0)80~9A(AMT~2}%Q`M[RY.png)

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Attachment: `N0)80~9A(AMT~2}%Q`M[RY.png

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Description: 
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



  was:
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83


Index  â–´ID  Attempt Status  Locality Level  Executor ID / Host  
Launch Time DurationGC Time Output Size / Records   Shuffle Read 
Size / Records Errors
0   30910   SUCCESS PROCESS_LOCAL   1554 / 10.215.134.227   
2016/07/25 02:33:39 19 min  1.2 min 0.0 B / 149525281121.5 MB / 
15286949
0   37940   FAILED  PROCESS_LOCAL   2027 / 10.215.146.29
2016/07/25 02:47:04 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 3794
0   40941   FAILED  PROCESS_LOCAL   2546 / 10.196.150.233   
2016/07/25 03:08:58 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4094
0   43842   FAILED  PROCESS_LOCAL   2823 / 10.215.155.155   
2016/07/25 03:29:50 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4384
0   45733   FAILED  PROCESS_LOCAL   3011 / 10.215.139.24
2016/07/25 03:46:50 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4573
0   48054   SUCCESS PROCESS_LOCAL   3246 / 10.196.138.215   
2016/07/25 04:06:12 0 ms1.3 min 0.0 B / 149524481121.5 MB / 
15286949
1   30920   SUCCESS PROCESS_LOCAL   1505 / 10.196.130.102   
2016/07/25 02:33:39 22 min  4.9 min 0.0 B / 149536921121.9 MB / 
15288628
1   37950   FAILED  PROCESS_LOCAL   2253 / 10.196.145.33
2016/07/25 02:48:28 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 3795
1   40741   FAILED  PROCESS_LOCAL   2493 / 10.196.148.109   
2016/07/25 03:08:49 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4074
1   42632   FAILED  PROCESS_LOCAL   2705 / 10.196.149.21
2016/07/25 03:25:05 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4263


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Description: 
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83


Index  â–´ID  Attempt Status  Locality Level  Executor ID / Host  
Launch Time DurationGC Time Output Size / Records   Shuffle Read 
Size / Records Errors
0   30910   SUCCESS PROCESS_LOCAL   1554 / 10.215.134.227   
2016/07/25 02:33:39 19 min  1.2 min 0.0 B / 149525281121.5 MB / 
15286949
0   37940   FAILED  PROCESS_LOCAL   2027 / 10.215.146.29
2016/07/25 02:47:04 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 3794
0   40941   FAILED  PROCESS_LOCAL   2546 / 10.196.150.233   
2016/07/25 03:08:58 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4094
0   43842   FAILED  PROCESS_LOCAL   2823 / 10.215.155.155   
2016/07/25 03:29:50 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4384
0   45733   FAILED  PROCESS_LOCAL   3011 / 10.215.139.24
2016/07/25 03:46:50 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4573
0   48054   SUCCESS PROCESS_LOCAL   3246 / 10.196.138.215   
2016/07/25 04:06:12 0 ms1.3 min 0.0 B / 149524481121.5 MB / 
15286949
1   30920   SUCCESS PROCESS_LOCAL   1505 / 10.196.130.102   
2016/07/25 02:33:39 22 min  4.9 min 0.0 B / 149536921121.9 MB / 
15288628
1   37950   FAILED  PROCESS_LOCAL   2253 / 10.196.145.33
2016/07/25 02:48:28 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 3795
1   40741   FAILED  PROCESS_LOCAL   2493 / 10.196.148.109   
2016/07/25 03:08:49 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4074
1   42632   FAILED  PROCESS_LOCAL   2705 / 10.196.149.21
2016/07/25 03:25:05 /   /   TaskCommitDenied 
(Driver denied task commit) for job: 2, partition: 1, attemptNumber: 4263

  was:
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83
> Index  â–´  ID  Attempt Status  Locality Level  Executor ID / Host  
> Launch Time DurationGC Time Output Size / Records   Shuffle Read 
> Size / Records Errors
> 0 30910   SUCCESS PROCESS_LOCAL   1554 / 10.215.134.227   
> 2016/07/25 02:33:39 19 min  1.2 min 0.0 B / 149525281121.5 MB / 
> 15286949
> 0 37940   FAILED  PROCESS_LOCAL   2027 / 10.215.146.29
> 2016/07/25 02:47:04 /   /   TaskCommitDenied 
> (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 3794
> 0 40941   FAILED  PROCESS_LOCAL   2546 / 10.196.150.233   
> 2016/07/25 03:08:58 /   /   TaskCommitDenied 
> (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4094
> 0 43842   FAILED  PROCESS_LOCAL   2823 / 10.215.155.155   
> 2016/07/25 03:29:50 /   /   TaskCommitDenied 
> (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4384
> 0 45733   FAILED  PROCESS_LOCAL   3011 / 10.215.139.24
> 2016/07/25 03:46:50 /   /   TaskCommitDenied 
> (Driver denied task commit) for job: 2, partition: 0, attemptNumber: 4573
> 0 48054   SUCCESS PROCESS_LOCAL   3246 / 10.196.138.215   
> 2016/07/25 

[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Description: 
In our cluster, we set spark. When a task throw exception at 
SparkHadoopMapRedUtil.performCommit(), this task can retry infinite 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark. When a task throw exception at 
> SparkHadoopMapRedUtil.performCommit(), this task can retry infinite 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Description: 
In our cluster, we set spark.speculation=true,  but when a task throw exception 
at SparkHadoopMapRedUtil.performCommit(), this task can retry infinite.
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83

  was:
In our cluster, we set spark. When a task throw exception at 
SparkHadoopMapRedUtil.performCommit(), this task can retry infinite 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83


> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16709:
--
Affects Version/s: 1.6.0

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2016-07-25 Thread Hong Shen (JIRA)
Hong Shen created SPARK-16709:
-

 Summary: Task with commit failed will retry infinite when 
speculation set to true
 Key: SPARK-16709
 URL: https://issues.apache.org/jira/browse/SPARK-16709
 Project: Spark
  Issue Type: Bug
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16708) ExecutorAllocationManager.numRunningTasks can be negative when stage retry

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16708:
--
Description: 
When a task fetch failed, the stage will complete and retry, when the stage 
complete, ExecutorAllocationManager.numRunningTasks will be set 0, here is the 
code:
{code}
override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): 
Unit = {
  val stageId = stageCompleted.stageInfo.stageId
  allocationManager.synchronized {
stageIdToNumTasks -= stageId
stageIdToTaskIndices -= stageId
stageIdToExecutorPlacementHints -= stageId

// Update the executor placement hints
updateExecutorPlacementHints()

// If this is the last stage with pending tasks, mark the scheduler 
queue as empty
// This is needed in case the stage is aborted for any reason
if (stageIdToNumTasks.isEmpty) {
  allocationManager.onSchedulerQueueEmpty()
  if (numRunningTasks != 0) {
logWarning("No stages are running, but numRunningTasks != 0")
numRunningTasks = 0
  }
}
  }
}
{code}
But  when the stage's running task finished, numRunningTasks will minus 1, so 
numRunningTasks be negative, it can cause maxNeeded be negative.
{code}
override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
  val executorId = taskEnd.taskInfo.executorId
  val taskId = taskEnd.taskInfo.taskId
  val taskIndex = taskEnd.taskInfo.index
  val stageId = taskEnd.stageId
  allocationManager.synchronized {
numRunningTasks -= 1

{code}

  was:
When a task fetch failed, the stage will complete and retry, when the stage 
complete, 
 if (stageIdToNumTasks.isEmpty) {
  allocationManager.onSchedulerQueueEmpty()
  if (numRunningTasks != 0) {
logWarning("No stages are running, but numRunningTasks != 0")
numRunningTasks = 0
  }
}


> ExecutorAllocationManager.numRunningTasks can be negative when stage retry
> --
>
> Key: SPARK-16708
> URL: https://issues.apache.org/jira/browse/SPARK-16708
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> When a task fetch failed, the stage will complete and retry, when the stage 
> complete, ExecutorAllocationManager.numRunningTasks will be set 0, here is 
> the code:
> {code}
> override def onStageCompleted(stageCompleted: 
> SparkListenerStageCompleted): Unit = {
>   val stageId = stageCompleted.stageInfo.stageId
>   allocationManager.synchronized {
> stageIdToNumTasks -= stageId
> stageIdToTaskIndices -= stageId
> stageIdToExecutorPlacementHints -= stageId
> // Update the executor placement hints
> updateExecutorPlacementHints()
> // If this is the last stage with pending tasks, mark the scheduler 
> queue as empty
> // This is needed in case the stage is aborted for any reason
> if (stageIdToNumTasks.isEmpty) {
>   allocationManager.onSchedulerQueueEmpty()
>   if (numRunningTasks != 0) {
> logWarning("No stages are running, but numRunningTasks != 0")
> numRunningTasks = 0
>   }
> }
>   }
> }
> {code}
> But  when the stage's running task finished, numRunningTasks will minus 1, so 
> numRunningTasks be negative, it can cause maxNeeded be negative.
> {code}
> override def onTaskEnd(taskEnd: SparkListenerTaskEnd): Unit = {
>   val executorId = taskEnd.taskInfo.executorId
>   val taskId = taskEnd.taskInfo.taskId
>   val taskIndex = taskEnd.taskInfo.index
>   val stageId = taskEnd.stageId
>   allocationManager.synchronized {
> numRunningTasks -= 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16708) ExecutorAllocationManager.numRunningTasks can be negative when stage retry

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16708:
--
Description: 
When a task fetch failed, the stage will complete and retry, when the stage 
complete, 
 if (stageIdToNumTasks.isEmpty) {
  allocationManager.onSchedulerQueueEmpty()
  if (numRunningTasks != 0) {
logWarning("No stages are running, but numRunningTasks != 0")
numRunningTasks = 0
  }
}

> ExecutorAllocationManager.numRunningTasks can be negative when stage retry
> --
>
> Key: SPARK-16708
> URL: https://issues.apache.org/jira/browse/SPARK-16708
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> When a task fetch failed, the stage will complete and retry, when the stage 
> complete, 
>  if (stageIdToNumTasks.isEmpty) {
>   allocationManager.onSchedulerQueueEmpty()
>   if (numRunningTasks != 0) {
> logWarning("No stages are running, but numRunningTasks != 0")
> numRunningTasks = 0
>   }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16708) ExecutorAllocationManager.numRunningTasks can be negative when stage retry

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16708:
--
Affects Version/s: 1.6.0

> ExecutorAllocationManager.numRunningTasks can be negative when stage retry
> --
>
> Key: SPARK-16708
> URL: https://issues.apache.org/jira/browse/SPARK-16708
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16708) ExecutorAllocationManager.numRunningTasks can be negative when stage retry

2016-07-25 Thread Hong Shen (JIRA)
Hong Shen created SPARK-16708:
-

 Summary: ExecutorAllocationManager.numRunningTasks can be negative 
when stage retry
 Key: SPARK-16708
 URL: https://issues.apache.org/jira/browse/SPARK-16708
 Project: Spark
  Issue Type: Bug
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391596#comment-15391596
 ] 

Hong Shen commented on SPARK-16707:
---

There is a same error here:
http://stackoverflow.com/questions/26390968/why-does-the-channel-pipeline-get-returns-a-null-handler-in-netty4

> TransportClientFactory.createClient may throw NPE
> -
>
> Key: SPARK-16707
> URL: https://issues.apache.org/jira/browse/SPARK-16707
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> I have encounter  some NullPointerException when 
> TransportClientFactory.createClient in my cluster, here is the following 
> stack trace.
> {code}
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
>   ... 32 more
> {code}
> The code is at 
> 

[jira] [Updated] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16707:
--
Description: 
I have encounter  some NullPointerException when 
TransportClientFactory.createClient in my cluster, here is the following stack 
trace.
{code}
org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
... 32 more
{code}

The code is at 
https://github.com/apache/spark/blob/branch-1.6/network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L144
{code}
  TransportChannelHandler handler = cachedClient.getChannel().pipeline()
.get(TransportChannelHandler.class);
  synchronized (handler) {
{code}
synchronized (handler)  throw NullPointerException, handler is null.



  was:
I have encounter  some NullPointerException when 
TransportClientFactory.createClient in my cluster, here is the following stack 
trace.
{code}
org.apache.spark.shuffle.FetchFailedException
at 

[jira] [Updated] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16707:
--
Description: 
I have encounter  some NullPointerException when 
TransportClientFactory.createClient in my cluster, here is the following stack 
trace.
{code}
org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
... 32 more
{code}

The code is at 
https://github.com/apache/spark/blob/branch-1.6/network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L144
{code}
if (cachedClient != null && cachedClient.isActive()) {
  // Make sure that the channel will not timeout by updating the last use 
time of the
  // handler. Then check that the client is still alive, in case it timed 
out before
  // this code was able to update things.
  TransportChannelHandler handler = cachedClient.getChannel().pipeline()
.get(TransportChannelHandler.class);
{color:red}
  synchronized (handler) {
{color}

[jira] [Updated] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16707:
--
Description: 
I have encounter  some NullPointerException when 
TransportClientFactory.createClient in my cluster, here is the following stack 
trace.
{code}
org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
... 32 more
{code}

The code is at 
https://github.com/apache/spark/blob/branch-1.6/network/common/src/main/java/org/apache/spark/network/client/TransportClientFactory.java#L144


  was:

org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

[jira] [Updated] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16707:
--
Affects Version/s: 1.6.0

> TransportClientFactory.createClient may throw NPE
> -
>
> Key: SPARK-16707
> URL: https://issues.apache.org/jira/browse/SPARK-16707
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
>   at 
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:89)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
>   ... 32 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-16707:
--
Description: 

org.apache.spark.shuffle.FetchFailedException
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:511)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:686)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:95)
at 
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NullPointerException
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:144)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient$1.createAndStart(ExternalShuffleClient.java:107)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:146)
at 
org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:126)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.fetchBlocks(ExternalShuffleClient.java:116)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:155)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.fetchUpToMaxBytes(ShuffleBlockFetcherIterator.scala:319)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:299)
... 32 more


> TransportClientFactory.createClient may throw NPE
> -
>
> Key: SPARK-16707
> URL: https://issues.apache.org/jira/browse/SPARK-16707
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> org.apache.spark.shuffle.FetchFailedException
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:326)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:303)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
>   at 

[jira] [Created] (SPARK-16707) TransportClientFactory.createClient may throw NPE

2016-07-25 Thread Hong Shen (JIRA)
Hong Shen created SPARK-16707:
-

 Summary: TransportClientFactory.createClient may throw NPE
 Key: SPARK-16707
 URL: https://issues.apache.org/jira/browse/SPARK-16707
 Project: Spark
  Issue Type: Bug
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-06-20 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Attachment: spark-13510.diff

Add FetchStream for shuffle, if the fetch file is bigger than 
spark.shuffle.max.block.size.inmemory, use FetchStream to fetch the file to 
disk, it will resolve direct buffer memory exception.


> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Attachments: spark-13510.diff
>
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 
>   If the block is more than we can cache in memory, we  should write to disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173675#comment-15173675
 ] 

Hong Shen commented on SPARK-13510:
---

I have resolve in our own edition, I will add a pull request this weekend.

> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 
>   If the block is more than we can cache in memory, we  should write to disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:744)
16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests 
outstanding when connection from /10.196.134.220:7337 is closed
16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
shuffle_3_81_2, and will not retry (0 retries)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  If the block is more than we can cache in memory, we  should write to disk.


  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
at 

[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:744)
16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests 
outstanding when connection from /10.196.134.220:7337 is closed
16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
shuffle_3_81_2, and will not retry (0 retries)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  If the block is more


  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at 

[jira] [Commented] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173658#comment-15173658
 ] 

Hong Shen commented on SPARK-13510:
---

We can't resolve all the shuffle OOM by allocate more memory, If reduce shuffle 
a block more than 5GB, it shouldn't restore in memory.

> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173648#comment-15173648
 ] 

Hong Shen commented on SPARK-13510:
---

In our cluster, we have lot of sql run on hive,  I want to use spark sql to 
replace hive.
But  there is a lot of sql's input are more the 10TB, shuffe block could be 
more than 5GB,
When I run some sql on spark sql, some sql failed because shuffle OOM, I can't 
allocate such more memory to resolve all the failed sql.

> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen reopened SPARK-13510:
---

> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-28 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171330#comment-15171330
 ] 

Hong Shen commented on SPARK-13510:
---

Add more log, fetch a block of 915.4 MB.
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337


> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> 16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request 
> for 1 blocks (915.4 MB) from 10.196.134.220:7337
> 16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
> from 10.196.134.220:7337 (executor id 122)
> 16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 
> to /10.196.134.220:7337
> 16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in 
> connection from /10.196.134.220:7337
> java.lang.OutOfMemoryError: Direct buffer memory
>   at java.nio.Bits.reserveMemory(Bits.java:658)
>   at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
>   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
>   at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
>   at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
>   at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
>   at 
> io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
>   at 
> io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
>   at 
> io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>   at java.lang.Thread.run(Thread.java:744)
> 16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 
> requests outstanding when connection from /10.196.134.220:7337 is closed
> 16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
> shuffle_3_81_2, and will not retry (0 retries)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-27 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
16/02/17 15:36:03 INFO storage.ShuffleBlockFetcherIterator: Sending request for 
1 blocks (915.4 MB) from 10.196.134.220:7337
16/02/17 15:36:03 INFO shuffle.ExternalShuffleClient: External shuffle fetch 
from 10.196.134.220:7337 (executor id 122)
16/02/17 15:36:03 INFO client.TransportClient: Sending fetch chunk request 0 to 
/10.196.134.220:7337
16/02/17 15:36:36 WARN server.TransportChannelHandler: Exception in connection 
from /10.196.134.220:7337
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:645)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:228)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:212)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
at 
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:271)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:155)
at 
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:146)
at 
io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:107)
at 
io.netty.channel.AdaptiveRecvByteBufAllocator$HandleImpl.allocate(AdaptiveRecvByteBufAllocator.java:104)
at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:117)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:744)
16/02/17 15:36:36 ERROR client.TransportResponseHandler: Still have 1 requests 
outstanding when connection from /10.196.134.220:7337 is closed
16/02/17 15:36:36 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block 
shuffle_3_81_2, and will not retry (0 retries)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 


  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct 

[jira] [Commented] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-26 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168693#comment-15168693
 ] 

Hong Shen commented on SPARK-13510:
---

  I will add the logic in my edition.

> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
> failed.
> {code}
> org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
>   at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
> {code}
>   The reason is that when shuffle a big block(like 1G), task will allocate 
> the same memory, it will easily throw "FetchFailedException: Direct buffer 
> memory".
>   If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
> throw 
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
> at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
> at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
> {code}
>   
>   In mapreduce shuffle, it will firstly judge whether the block can cache in 
> memery, but spark doesn't. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-26 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 


  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  I will add the logic in my edition.


> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
> In our 

[jira] [Created] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-26 Thread Hong Shen (JIRA)
Hong Shen created SPARK-13510:
-

 Summary: Shuffle may throw FetchFailedException: Direct buffer 
memory
 Key: SPARK-13510
 URL: https://issues.apache.org/jira/browse/SPARK-13510
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Hong Shen


In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory+details
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  I will add the logic in my edition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13510) Shuffle may throw FetchFailedException: Direct buffer memory

2016-02-26 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13510:
--
Description: 
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  I will add the logic in my edition.

  was:
In our cluster, when I test spark-1.6.0 with a sql, it throw exception and 
failed.
{code}
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory+details
org.apache.spark.shuffle.FetchFailedException: Direct buffer memory
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:167)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:759)
{code}
  The reason is that when shuffle a big block(like 1G), task will allocate the 
same memory, it will easily throw "FetchFailedException: Direct buffer memory".
  If I add -Dio.netty.noUnsafe=true spark.executor.extraJavaOptions, it will 
throw 
{code}
java.lang.OutOfMemoryError: Java heap space
at 
io.netty.buffer.PoolArena$HeapArena.newUnpooledChunk(PoolArena.java:607)
at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:237)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:132)
{code}
  
  In mapreduce shuffle, it will firstly judge whether the block can cache in 
memery, but spark doesn't. 
  I will add the logic in my edition.


> Shuffle may throw FetchFailedException: Direct buffer memory
> 
>
> Key: SPARK-13510
> URL: https://issues.apache.org/jira/browse/SPARK-13510
> Project: Spark
>  Issue Type: Bug

[jira] [Commented] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2016-02-24 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15162858#comment-15162858
 ] 

Hong Shen commented on SPARK-13450:
---

Thanks, I will add in my own branch first.

> SortMergeJoin will OOM when join rows have lot of same keys
> ---
>
> Key: SPARK-13450
> URL: https://issues.apache.org/jira/browse/SPARK-13450
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
>   When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
> failed. I have set spark.executor.memory  4096m.
>   SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if 
> the join rows have a lot of same key, it will throw OutOfMemoryError.
> {code}
>   /** Buffered rows from the buffered side of the join. This is empty if 
> there are no matches. */
>   private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
> ArrayBuffer[InternalRow]
> {code}
>   Here is the stackTrace:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
> org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
> org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> java.io.DataInputStream.readLong(DataInputStream.java:416)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoin.scala:300)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.bufferMatchingRows(SortMergeJoin.scala:329)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoin.scala:229)
> org.apache.spark.sql.execution.joins.SortMergeJoin$$anonfun$doExecute$1$$anon$1.advanceNext(SortMergeJoin.scala:105)
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:89)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2016-02-23 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160271#comment-15160271
 ] 

Hong Shen commented on SPARK-13450:
---

A join has a lot of rows with the same key.

> SortMergeJoin will OOM when join rows have lot of same keys
> ---
>
> Key: SPARK-13450
> URL: https://issues.apache.org/jira/browse/SPARK-13450
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
>   When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
> failed. I have set spark.executor.memory  4096m.
>   SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if 
> the join rows have a lot of same key, it will throw OutOfMemoryError.
> {code}
>   /** Buffered rows from the buffered side of the join. This is empty if 
> there are no matches. */
>   private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
> ArrayBuffer[InternalRow]
> {code}
>   Here is the stackTrace:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
> org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
> org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> java.io.DataInputStream.readLong(DataInputStream.java:416)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoin.scala:300)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.bufferMatchingRows(SortMergeJoin.scala:329)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoin.scala:229)
> org.apache.spark.sql.execution.joins.SortMergeJoin$$anonfun$doExecute$1$$anon$1.advanceNext(SortMergeJoin.scala:105)
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:89)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2016-02-23 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158566#comment-15158566
 ] 

Hong Shen commented on SPARK-13450:
---

I think we should add a ExternalAppendOnlyArrayBuffer to replace ArrayBuffer.
I will add in my own branch first.

> SortMergeJoin will OOM when join rows have lot of same keys
> ---
>
> Key: SPARK-13450
> URL: https://issues.apache.org/jira/browse/SPARK-13450
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hong Shen
>
>   When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
> failed. I have set spark.executor.memory  4096m.
>   SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if 
> the join rows have a lot of same key, it will throw OutOfMemoryError.
> {code}
>   /** Buffered rows from the buffered side of the join. This is empty if 
> there are no matches. */
>   private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
> ArrayBuffer[InternalRow]
> {code}
>   Here is the stackTrace:
> {code}
> org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
> org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
> org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
> java.io.DataInputStream.readFully(DataInputStream.java:195)
> java.io.DataInputStream.readLong(DataInputStream.java:416)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
> org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
> org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoin.scala:300)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.bufferMatchingRows(SortMergeJoin.scala:329)
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoin.scala:229)
> org.apache.spark.sql.execution.joins.SortMergeJoin$$anonfun$doExecute$1$$anon$1.advanceNext(SortMergeJoin.scala:105)
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
> org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:89)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:744)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2016-02-23 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-13450:
--
Description: 
  When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
failed. I have set spark.executor.memory  4096m.
  SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if the 
join rows have a lot of same key, it will throw OutOfMemoryError.

{code}
  /** Buffered rows from the buffered side of the join. This is empty if there 
are no matches. */
  private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
ArrayBuffer[InternalRow]
{code}


  Here is the stackTrace:
{code}
org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
java.io.DataInputStream.readFully(DataInputStream.java:195)
java.io.DataInputStream.readLong(DataInputStream.java:416)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoin.scala:300)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.bufferMatchingRows(SortMergeJoin.scala:329)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoin.scala:229)
org.apache.spark.sql.execution.joins.SortMergeJoin$$anonfun$doExecute$1$$anon$1.advanceNext(SortMergeJoin.scala:105)
org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:89)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
{code}

  was:
  When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
failed. I have set spark.executor.memory  4096m.
  SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if the 
join rows have a lot of same key, it will throw OutOfMemoryError.

{code:title=Bar.java|borderStyle=solid}
  /** Buffered rows from the buffered side of the join. This is empty if there 
are no matches. */
  private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
ArrayBuffer[InternalRow]
{code}


  Here is the stackTrace:
org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
java.io.DataInputStream.readFully(DataInputStream.java:195)
java.io.DataInputStream.readLong(DataInputStream.java:416)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)

[jira] [Created] (SPARK-13450) SortMergeJoin will OOM when join rows have lot of same keys

2016-02-23 Thread Hong Shen (JIRA)
Hong Shen created SPARK-13450:
-

 Summary: SortMergeJoin will OOM when join rows have lot of same 
keys
 Key: SPARK-13450
 URL: https://issues.apache.org/jira/browse/SPARK-13450
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Hong Shen


  When I run a sql with join, task throw  java.lang.OutOfMemoryError and sql 
failed. I have set spark.executor.memory  4096m.
  SortMergeJoin use a ArrayBuffer[InternalRow] to store bufferedMatches, if the 
join rows have a lot of same key, it will throw OutOfMemoryError.

{code:title=Bar.java|borderStyle=solid}
  /** Buffered rows from the buffered side of the join. This is empty if there 
are no matches. */
  private[this] val bufferedMatches: ArrayBuffer[InternalRow] = new 
ArrayBuffer[InternalRow]
{code}


  Here is the stackTrace:
org.xerial.snappy.SnappyNative.arrayCopy(Native Method)
org.xerial.snappy.Snappy.arrayCopy(Snappy.java:84)
org.xerial.snappy.SnappyInputStream.rawRead(SnappyInputStream.java:190)
org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:163)
java.io.DataInputStream.readFully(DataInputStream.java:195)
java.io.DataInputStream.readLong(DataInputStream.java:416)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.loadNext(UnsafeSorterSpillReader.java:71)
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillMerger$2.loadNext(UnsafeSorterSpillMerger.java:79)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:136)
org.apache.spark.sql.execution.UnsafeExternalRowSorter$1.next(UnsafeExternalRowSorter.java:123)
org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:84)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedBufferedToRowWithNullFreeJoinKey(SortMergeJoin.scala:300)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.bufferMatchingRows(SortMergeJoin.scala:329)
org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoin.scala:229)
org.apache.spark.sql.execution.joins.SortMergeJoin$$anonfun$doExecute$1$$anon$1.advanceNext(SortMergeJoin.scala:105)
org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:88)
org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1$$anonfun$2.apply(TungstenAggregate.scala:86)
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:741)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:337)
org.apache.spark.rdd.RDD.iterator(RDD.scala:301)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
org.apache.spark.scheduler.Task.run(Task.scala:89)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:215)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7271) Redesign shuffle interface for binary processing

2015-10-16 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14960363#comment-14960363
 ] 

Hong Shen commented on SPARK-7271:
--

Hi, I have a question, are you plan to rededign the shuffle reader to implement 
binary processing? If so, when will you complete it?

> Redesign shuffle interface for binary processing
> 
>
> Key: SPARK-7271
> URL: https://issues.apache.org/jira/browse/SPARK-7271
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Reporter: Reynold Xin
>Assignee: Josh Rosen
>
> Current shuffle interface is not exactly ideal for binary processing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10918) Task failed because executor kill by driver

2015-10-03 Thread Hong Shen (JIRA)
Hong Shen created SPARK-10918:
-

 Summary: Task failed because executor kill by driver
 Key: SPARK-10918
 URL: https://issues.apache.org/jira/browse/SPARK-10918
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.1
Reporter: Hong Shen


When dynamicAllocation is enabled, when a executor was idle timeout, it will be 
kill by driver, if a task offer to the executor at the same time, the task will 
failed due to executor lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when partition field and fieldSchema exist in sql predicate

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Description: 
When partition field and fieldSchema exist in sql predicates, pruner partition 
won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
Table t_dw_qqlive_209026  is partition by imp_date, itimestamp is a 
fieldSchema in t_dw_qqlive_209026.
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table t_dw_qqlive_209026.



  was:
When partition field and fieldSchema exist in sql predicates, pruner partition 
won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table t_dw_qqlive_209026.




 Pruner partition won't effective when partition field and fieldSchema exist 
 in sql predicate
 

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When partition field and fieldSchema exist in sql predicates, pruner 
 partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 Table t_dw_qqlive_209026  is partition by imp_date, itimestamp is a 
 fieldSchema in t_dw_qqlive_209026.
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when partition field and fieldSchema exit in sql predicate

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Description: 
When partition field and fieldSchema exist in sql predicates, pruner partition 
won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table t_dw_qqlive_209026.



  was:
When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table t_dw_qqlive_209026.




 Pruner partition won't effective when partition field and fieldSchema exit in 
 sql predicate
 ---

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When partition field and fieldSchema exist in sql predicates, pruner 
 partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when partition field and fieldSchema exist in sql predicate

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Summary: Pruner partition won't effective when partition field and 
fieldSchema exist in sql predicate  (was: Pruner partition won't effective when 
partition field and fieldSchema exit in sql predicate)

 Pruner partition won't effective when partition field and fieldSchema exist 
 in sql predicate
 

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When partition field and fieldSchema exist in sql predicates, pruner 
 partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when partition field and fieldSchema exit in sql predicate

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Summary: Pruner partition won't effective when partition field and 
fieldSchema exit in sql predicate  (was: Pruner partition won't effective when 
udf exit in sql predicates)

 Pruner partition won't effective when partition field and fieldSchema exit in 
 sql predicate
 ---

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When udf exit in sql predicates, pruner partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8403) Pruner partition won't effective when udf exit in sql predicates

2015-06-17 Thread Hong Shen (JIRA)
Hong Shen created SPARK-8403:


 Summary: Pruner partition won't effective when udf exit in sql 
predicates
 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen


When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table fromt_dw_qqlive_209026.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when udf exit in sql predicates

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Description: 
When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table t_dw_qqlive_209026.



  was:
When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table from t_dw_qqlive_209026.




 Pruner partition won't effective when udf exit in sql predicates
 

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When udf exit in sql predicates, pruner partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8403) Pruner partition won't effective when udf exit in sql predicates

2015-06-17 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-8403:
-
Description: 
When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table from t_dw_qqlive_209026.



  was:
When udf exit in sql predicates, pruner partition won't effective.
Here is the sql,
{code}
select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r where 
r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
{code}
When run on hive, it will only scan data in partition 20150615, but if run on 
spark sql, it will scan the whole table fromt_dw_qqlive_209026.




 Pruner partition won't effective when udf exit in sql predicates
 

 Key: SPARK-8403
 URL: https://issues.apache.org/jira/browse/SPARK-8403
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Hong Shen

 When udf exit in sql predicates, pruner partition won't effective.
 Here is the sql,
 {code}
 select r.uin,r.vid,r.ctype,r.bakstr2,r.cmd from t_dw_qqlive_209026 r 
 where r.cmd = 2 and (r.imp_date = 20150615 or and hour(r.itimestamp)16)
 {code}
 When run on hive, it will only scan data in partition 20150615, but if run on 
 spark sql, it will scan the whole table from t_dw_qqlive_209026.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) BlockManager heartbeat expiration does not kill executor

2015-04-24 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512244#comment-14512244
 ] 

Hong Shen commented on SPARK-5529:
--

1.4.0 version would be release in june.

 BlockManager heartbeat expiration does not kill executor
 

 Key: SPARK-5529
 URL: https://issues.apache.org/jira/browse/SPARK-5529
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, YARN
Affects Versions: 1.2.0
Reporter: Hong Shen
Assignee: Hong Shen
 Fix For: 1.4.0

 Attachments: SPARK-5529.patch


 When I run a spark job, one executor is hold, after 120s, blockManager is 
 removed by driver, but after half an hour before the executor is remove by  
 driver. Here is the log:
 {code}
 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
 exceeds 12ms
 
 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
 10.215.143.14: remote Akka client disassociated
 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
 0.0
 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
 10.215.143.14): ExecutorLostFailure (executor 1 lost)
 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
 non-existent executor 1
 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
 from BlockManagerMaster.
 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
 removeExecutor
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-20 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size, there is a bug in 

  was:
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size


 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size, there is a bug in 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-20 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen reopened SPARK-6738:
--

There is a in SizeEstimator

 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-20 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size, there is a bug in 
SizeEstimator.visitArray.

  was:
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size, there is a bug in 


 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size, there is a bug in 
 SizeEstimator.visitArray.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-20 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504202#comment-14504202
 ] 

Hong Shen edited comment on SPARK-6738 at 4/21/15 2:54 AM:
---

There is a bug in SizeEstimator


was (Author: shenhong):
There is a in SizeEstimator

 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)
Hong Shen created SPARK-6738:


 Summary: EstimateSize  is difference with spill file size
 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen


ExternalAppendOnlyMap spill 1100M data to disk:
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.8 MB to disk (15 times so far)

But the file size is only 1.1M.
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
-rw-r- 1 spark users 1.1M Apr  7 16:39 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b

The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.8 MB to disk (15 times so far)
{code}

But the file size is only 1.1M.

{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
-rw-r- 1 spark users 1.1M Apr  7 16:39 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
{code}

Here is the other spilled file.
{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/*
 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/09:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_3a568e10-3997-4d13-adf1-e8dfe4ba4727

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/18:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_66c0df48-5d79-448b-8989-84ce1a5507d0
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_f6870214-bfd5-47b2-b0b9-37194b55761b

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1a:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_ba1712d2-0eb8-4833-9fa6-a87ee670826c

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1b:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_1d1df5b7-846c-4bcd-a9de-c328e50e62db

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1f:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_38c6a144-b588-49b1-b0f0-b91b31c2e85f

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/20:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_d816a301-7520-4b6e-8866-bc445f04f0c9

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/24:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_7a3619cf-20e1-4815-8b1f-faeefde40d73

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/2e:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_1e366a65-ced7-4b17-a085-5f873ff6dc43
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_6772b815-11ee-413a-bc6c-d0dc9cbffc51
{code}

The estimateSize  is hugh difference with spill file size

  was:
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)

[jira] [Commented] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483011#comment-14483011
 ] 

Hong Shen commented on SPARK-6738:
--

Yes, it spill lots of files, but each one has only 1.1M. 

 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 1100M data to disk:
 {code}
 15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
 in-memory map of 1106.5 MB to disk (12 times so far)
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
 15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
 in-memory map of 1106.3 MB to disk (13 times so far)
 /data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
 15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
 in-memory map of 1105.8 MB to disk (14 times so far)
 /data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
 15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
 in-memory map of 1106.8 MB to disk (15 times so far)
 {code}
 But the file size is only 1.1M.
 {code}
 [tdwadmin@tdw-10-215-149-231 
 ~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
  ll -h 
 /data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
 -rw-r- 1 spark users 1.1M Apr  7 16:39 
 /data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
 {code}
 Here is the other spilled file.
 {code}
 [tdwadmin@tdw-10-215-149-231 
 ~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
  ll -h 
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/*
  
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/09:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:39 
 temp_local_3a568e10-3997-4d13-adf1-e8dfe4ba4727
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/18:
 total 2.2M
 -rw-r- 1 spark users 1.1M Apr  7 16:41 
 temp_local_66c0df48-5d79-448b-8989-84ce1a5507d0
 -rw-r- 1 spark users 1.1M Apr  7 16:39 
 temp_local_f6870214-bfd5-47b2-b0b9-37194b55761b
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1a:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:40 
 temp_local_ba1712d2-0eb8-4833-9fa6-a87ee670826c
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1b:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:41 
 temp_local_1d1df5b7-846c-4bcd-a9de-c328e50e62db
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1f:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:41 
 temp_local_38c6a144-b588-49b1-b0f0-b91b31c2e85f
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/20:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:40 
 temp_local_d816a301-7520-4b6e-8866-bc445f04f0c9
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/24:
 total 1.1M
 -rw-r- 1 spark users 1.1M Apr  7 16:40 
 temp_local_7a3619cf-20e1-4815-8b1f-faeefde40d73
 /data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/2e:
 total 2.2M
 -rw-r- 1 spark users 1.1M Apr  7 16:39 
 temp_local_1e366a65-ced7-4b17-a085-5f873ff6dc43
 -rw-r- 1 spark users 1.1M Apr  7 16:41 
 temp_local_6772b815-11ee-413a-bc6c-d0dc9cbffc51
 {code}
 The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.8 MB to disk (15 times so far)
{code}

But the file size is only 1.1M.

{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
-rw-r- 1 spark users 1.1M Apr  7 16:39 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
{code}

Here are the other spilled files.
{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/*
 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/09:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_3a568e10-3997-4d13-adf1-e8dfe4ba4727

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/18:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_66c0df48-5d79-448b-8989-84ce1a5507d0
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_f6870214-bfd5-47b2-b0b9-37194b55761b

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1a:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_ba1712d2-0eb8-4833-9fa6-a87ee670826c

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1b:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_1d1df5b7-846c-4bcd-a9de-c328e50e62db

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1f:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_38c6a144-b588-49b1-b0f0-b91b31c2e85f

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/20:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_d816a301-7520-4b6e-8866-bc445f04f0c9

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/24:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_7a3619cf-20e1-4815-8b1f-faeefde40d73

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/2e:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_1e366a65-ced7-4b17-a085-5f873ff6dc43
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_6772b815-11ee-413a-bc6c-d0dc9cbffc51
{code}

The estimateSize  is hugh difference with spill file size

  was:
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)

[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.8 MB to disk (15 times so far)
{code}

But the file size is only 1.1M.

{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
-rw-r- 1 spark users 1.1M Apr  7 16:39 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
{code}

Here is the other spilled files.
{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/*
 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/09:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_3a568e10-3997-4d13-adf1-e8dfe4ba4727

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/18:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_66c0df48-5d79-448b-8989-84ce1a5507d0
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_f6870214-bfd5-47b2-b0b9-37194b55761b

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1a:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_ba1712d2-0eb8-4833-9fa6-a87ee670826c

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1b:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_1d1df5b7-846c-4bcd-a9de-c328e50e62db

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1f:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_38c6a144-b588-49b1-b0f0-b91b31c2e85f

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/20:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_d816a301-7520-4b6e-8866-bc445f04f0c9

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/24:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_7a3619cf-20e1-4815-8b1f-faeefde40d73

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/2e:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_1e366a65-ced7-4b17-a085-5f873ff6dc43
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_6772b815-11ee-413a-bc6c-d0dc9cbffc51
{code}

The estimateSize  is hugh difference with spill file size

  was:
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)

[jira] [Updated] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6738:
-
Description: 
ExternalAppendOnlyMap spill 2.2 GB data to disk:

{code}

15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
in-memory map of 2.2 GB to disk (61 times so far)
15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

But the file size is only 2.2M.

{code}
ll -h 
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
total 2.2M
-rw-r- 1 spark users 2.2M Apr  7 20:27 
temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
{code}

The GC log show that the jvm memory is less than 1GB.
{code}
2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
{code}

The estimateSize  is hugh difference with spill file size

  was:
ExternalAppendOnlyMap spill 1100M data to disk:

{code}
15/04/07 16:39:48 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.5 MB to disk (12 times so far)
/data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-994b/30/temp_local_e4347165-6263-4678-9f1d-67ad4bcd8fb5
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.3 MB to disk (13 times so far)
/data6/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-1e29/26/temp_local_76f9900b-1b3d-4cef-b3a2-6afcde14bbd9
15/04/07 16:39:49 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1105.8 MB to disk (14 times so far)
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
15/04/07 16:39:50 INFO collection.ExternalAppendOnlyMap: Thread 51 spilling 
in-memory map of 1106.8 MB to disk (15 times so far)
{code}

But the file size is only 1.1M.

{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
-rw-r- 1 spark users 1.1M Apr  7 16:39 
/data7/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-f883/26/temp_local_3ade0aec-ac1d-469d-bc99-b6fa87cb649b
{code}

Here are the other spilled files.
{code}
[tdwadmin@tdw-10-215-149-231 
~/tdwenv/tdwgaia/logs/container-logs/application_1423737010718_40308573/container_1423737010718_40308573_01_08]$
 ll -h 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/*
 
/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/09:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_3a568e10-3997-4d13-adf1-e8dfe4ba4727

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/18:
total 2.2M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_66c0df48-5d79-448b-8989-84ce1a5507d0
-rw-r- 1 spark users 1.1M Apr  7 16:39 
temp_local_f6870214-bfd5-47b2-b0b9-37194b55761b

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1a:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_ba1712d2-0eb8-4833-9fa6-a87ee670826c

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1b:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_1d1df5b7-846c-4bcd-a9de-c328e50e62db

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/1f:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:41 
temp_local_38c6a144-b588-49b1-b0f0-b91b31c2e85f

/data3/yarnenv/local/usercache/spark/appcache/application_1423737010718_40308573/spark-local-20150407163931-fe54/20:
total 1.1M
-rw-r- 1 spark users 1.1M Apr  7 16:40 
temp_local_d816a301-7520-4b6e-8866-bc445f04f0c9


[jira] [Commented] (SPARK-6738) EstimateSize is difference with spill file size

2015-04-07 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14483104#comment-14483104
 ] 

Hong Shen commented on SPARK-6738:
--

I don't think it's serialized cause the problem. the input data is a hive 
table, and the spark job is a spark SQL.
In the fact, when the log show that spilling in-memory map of 2.2 GB to disk, 
the file is only 2.2M, and the GC log show the jvm is less than 1GB. the 
estimateSize also deviation with the jvm memory.


 EstimateSize  is difference with spill file size
 

 Key: SPARK-6738
 URL: https://issues.apache.org/jira/browse/SPARK-6738
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 ExternalAppendOnlyMap spill 2.2 GB data to disk:
 {code}
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: Thread 54 spilling 
 in-memory map of 2.2 GB to disk (61 times so far)
 15/04/07 20:27:37 INFO collection.ExternalAppendOnlyMap: 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 But the file size is only 2.2M.
 {code}
 ll -h 
 /data11/yarnenv/local/usercache/spark/appcache/application_1423737010718_40455651/spark-local-20150407202613-4e80/11/
 total 2.2M
 -rw-r- 1 spark users 2.2M Apr  7 20:27 
 temp_local_fdb4a583-5d13-4394-bccb-e1217d5db812
 {code}
 The GC log show that the jvm memory is less than 1GB.
 {code}
 2015-04-07T20:27:08.023+0800: [GC 981981K-55363K(3961344K), 0.0341720 secs]
 2015-04-07T20:27:14.483+0800: [GC 987523K-53737K(3961344K), 0.0252660 secs]
 2015-04-07T20:27:20.793+0800: [GC 985897K-56370K(3961344K), 0.0606460 secs]
 2015-04-07T20:27:27.553+0800: [GC 988530K-59089K(3961344K), 0.0651840 secs]
 2015-04-07T20:27:34.067+0800: [GC 991249K-62153K(3961344K), 0.0288460 secs]
 2015-04-07T20:27:40.180+0800: [GC 994313K-61344K(3961344K), 0.0388970 secs]
 2015-04-07T20:27:46.490+0800: [GC 993504K-59915K(3961344K), 0.0235150 secs]
 {code}
 The estimateSize  is hugh difference with spill file size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6104) spark SQL shuffle OOM

2015-03-06 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14350323#comment-14350323
 ] 

Hong Shen commented on SPARK-6104:
--

[~rxin]
Can you link to the duplicate issue?

 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 SQL's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark SQL shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6104) spark sql shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6104:
-
Affects Version/s: 1.2.0

 spark sql shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 sql's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
 reset 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6104) spark sql shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6104:
-
Description: 
Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.
In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
reset 

  was:Currently, spark shuffle can use ExternalAppendOnlyMap to combine data 
that have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.


 spark sql shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 sql's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
 reset 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6104) spark sql shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6104:
-
Description: 
Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.
In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
reset spark.default.parallelism every day. I think we should add aggregator for 
spark sql shuffle.

  was:
Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.
In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
reset 


 spark sql shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 sql's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark sql shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6104) spark SQL shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6104:
-
Summary: spark SQL shuffle OOM  (was: spark sql shuffle OOM)

 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 sql's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark sql shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6104) spark SQL shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342863#comment-14342863
 ] 

Hong Shen commented on SPARK-6104:
--

[~rxin] [~sandyryza], can you pay attention to this issue?

 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 SQL's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark SQL shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6104) spark SQL shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-6104:
-
Description: 
Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
SQL's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.
In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
reset spark.default.parallelism every day. I think we should add aggregator for 
spark SQL shuffle.

  was:
Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.
In our cluster, we have used spark sql, but it often appear OOM, it's hard to 
reset spark.default.parallelism every day. I think we should add aggregator for 
spark sql shuffle.


 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 SQL's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark SQL shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6104) spark SQL shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342872#comment-14342872
 ] 

Hong Shen commented on SPARK-6104:
--

Can you link to the Duplicate issue?

 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 SQL's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark SQL shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6104) spark SQL shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342872#comment-14342872
 ] 

Hong Shen edited comment on SPARK-6104 at 3/2/15 7:25 AM:
--

Can you link to the duplicate issue?


was (Author: shenhong):
Can you link to the Duplicate issue?

 spark SQL shuffle OOM
 -

 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
 have fetched, but it depend on ShuffleDependency's aggregator, while spark 
 SQL's shuffle have not define aggregator, so it will easily cause OOM in 
 large scale of data.
 In our cluster, we have used spark SQL, but it often appear OOM, it's hard to 
 reset spark.default.parallelism every day. I think we should add aggregator 
 for spark SQL shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6104) spark sql shuffle OOM

2015-03-01 Thread Hong Shen (JIRA)
Hong Shen created SPARK-6104:


 Summary: spark sql shuffle OOM
 Key: SPARK-6104
 URL: https://issues.apache.org/jira/browse/SPARK-6104
 Project: Spark
  Issue Type: Improvement
Reporter: Hong Shen


Currently, spark shuffle can use ExternalAppendOnlyMap to combine data that 
have fetched, but it depend on ShuffleDependency's aggregator, while spark 
sql's shuffle have not define aggregator, so it will easily cause OOM in large 
scale of data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5736) Add executor log url

2015-02-11 Thread Hong Shen (JIRA)
Hong Shen created SPARK-5736:


 Summary: Add executor log url
 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Bug
Reporter: Hong Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page on Yarn

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Summary: Add executor log url to Executors page on Yarn  (was: Add executor 
log url to Executors page)

 Add executor log url to Executors page on Yarn
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, there is not  executor log url in spark ui (on Yarn),  we have to 
 read executor log by login the machine that executor in. I think we should 
 add executor log url to executors pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Component/s: Web UI

 Add executor log url to Executors page
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Affects Version/s: 1.2.0

 Add executor log url to Executors page
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Summary: Add executor log url to Executors page  (was: Add executor log url)

 Add executor log url to Executors page
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Issue Type: Improvement  (was: Bug)

 Add executor log url to Executors page
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5736) Add executor log url to Executors page

2015-02-11 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5736:
-
Description: Currently, there is not  executor log url in spark ui (on 
Yarn),  we have to read executor log by login the machine that executor in. I 
think we should add executor log url to executors pages.

 Add executor log url to Executors page
 --

 Key: SPARK-5736
 URL: https://issues.apache.org/jira/browse/SPARK-5736
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.2.0
Reporter: Hong Shen

 Currently, there is not  executor log url in spark ui (on Yarn),  we have to 
 read executor log by login the machine that executor in. I think we should 
 add executor log url to executors pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) Executor is still hold while BlockManager has been removed

2015-02-04 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304819#comment-14304819
 ] 

Hong Shen commented on SPARK-5529:
--

I had changed it in our own branch.

 Executor is still hold while BlockManager has been removed
 --

 Key: SPARK-5529
 URL: https://issues.apache.org/jira/browse/SPARK-5529
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen
 Attachments: SPARK-5529.patch


 When I run a spark job, one executor is hold, after 120s, blockManager is 
 removed by driver, but after half an hour before the executor is remove by  
 driver. Here is the log:
 {code}
 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
 exceeds 12ms
 
 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
 10.215.143.14: remote Akka client disassociated
 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
 0.0
 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
 10.215.143.14): ExecutorLostFailure (executor 1 lost)
 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
 non-existent executor 1
 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
 from BlockManagerMaster.
 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
 removeExecutor
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5529) Executor is still hold while BlockManager has been removed

2015-02-04 Thread Hong Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Shen updated SPARK-5529:
-
Attachment: SPARK-5529.patch

 Executor is still hold while BlockManager has been removed
 --

 Key: SPARK-5529
 URL: https://issues.apache.org/jira/browse/SPARK-5529
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen
 Attachments: SPARK-5529.patch


 When I run a spark job, one executor is hold, after 120s, blockManager is 
 removed by driver, but after half an hour before the executor is remove by  
 driver. Here is the log:
 {code}
 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
 exceeds 12ms
 
 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
 10.215.143.14: remote Akka client disassociated
 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
 0.0
 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
 10.215.143.14): ExecutorLostFailure (executor 1 lost)
 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
 non-existent executor 1
 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
 from BlockManagerMaster.
 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
 removeExecutor
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5529) Executor is still hold while BlockManager has been removed

2015-02-03 Thread Hong Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304433#comment-14304433
 ] 

Hong Shen commented on SPARK-5529:
--

Executor will lost when a akka throw a disassociatedEvent.

 Executor is still hold while BlockManager has been removed
 --

 Key: SPARK-5529
 URL: https://issues.apache.org/jira/browse/SPARK-5529
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Hong Shen

 When I run a spark job, one executor is hold, after 120s, blockManager is 
 removed by driver, but after half an hour before the executor is remove by  
 driver. Here is the log:
 15/02/02 14:58:43 WARN BlockManagerMasterActor: Removing BlockManager 
 BlockManagerId(1, 10.215.143.14, 47234) with no recent heart beats: 147198ms 
 exceeds 12ms
 
 15/02/02 15:26:55 ERROR YarnClientClusterScheduler: Lost executor 1 on 
 10.215.143.14: remote Akka client disassociated
 15/02/02 15:26:55 WARN ReliableDeliverySupervisor: Association with remote 
 system [akka.tcp://sparkExecutor@10.215.143.14:46182] has failed, address is 
 now gated for [5000] ms. Reason is: [Disassociated].
 15/02/02 15:26:55 INFO TaskSetManager: Re-queueing tasks for 1 from TaskSet 
 0.0
 15/02/02 15:26:55 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 3, 
 10.215.143.14): ExecutorLostFailure (executor 1 lost)
 15/02/02 15:26:55 ERROR YarnClientSchedulerBackend: Asked to remove 
 non-existent executor 1
 15/02/02 15:26:55 INFO DAGScheduler: Executor lost: 1 (epoch 0)
 15/02/02 15:26:55 INFO BlockManagerMasterActor: Trying to remove executor 1 
 from BlockManagerMaster.
 15/02/02 15:26:55 INFO BlockManagerMaster: Removed 1 successfully in 
 removeExecutor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >