[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-06 Thread JESSE CHEN (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JESSE CHEN updated SPARK-18745:
---
Description: 
Running query 68 with decreased executor memory (using 12GB executors instead 
of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
IndexOutOfBoundsException.

The query is as follows:
{noformat}
[select  c_last_name
   ,c_first_name
   ,ca_city
   ,bought_city
   ,ss_ticket_number
   ,extended_price
   ,extended_tax
   ,list_price
 from (select ss_ticket_number
 ,ss_customer_sk
 ,ca_city bought_city
 ,sum(ss_ext_sales_price) extended_price 
 ,sum(ss_ext_list_price) list_price
 ,sum(ss_ext_tax) extended_tax 
   from store_sales
   ,date_dim
   ,store
   ,household_demographics
   ,customer_address 
   where store_sales.ss_sold_date_sk = date_dim.d_date_sk
 and store_sales.ss_store_sk = store.s_store_sk  
and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
and store_sales.ss_addr_sk = customer_address.ca_address_sk
and date_dim.d_dom between 1 and 2 
and (household_demographics.hd_dep_count = 8 or
 household_demographics.hd_vehicle_count= -1)
and date_dim.d_year in (2000,2000+1,2000+2)
and store.s_city in ('Plainview','Rogers')
   group by ss_ticket_number
   ,ss_customer_sk
   ,ss_addr_sk,ca_city) dn
  ,customer
  ,customer_address current_addr
 where ss_customer_sk = c_customer_sk
   and customer.c_current_addr_sk = current_addr.ca_address_sk
   and current_addr.ca_city <> bought_city
 order by c_last_name
 ,ss_ticket_number
  limit 100]
{noformat}

Spark output that showed the exception:
{noformat}
org.apache.spark.SparkException: Exception thrown in awaitResult: 
at 
org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
at 
org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
at 
org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
at 
org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at 
org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at 
org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
at 
org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
at 
org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.consume(SortMergeJoinExec.scala:35)
at 
org.apache.spark.sql.execution.joins.SortMergeJoinExec.doProduce(SortMergeJoinExec.scala:560)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at 
org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-06 Thread JESSE CHEN (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JESSE CHEN updated SPARK-18745:
---
Labels:   (was: core dump)

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
> Fix For: 2.1.0
>
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodeg

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-06 Thread Kazuaki Ishizaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-18745:
-
Affects Version/s: 2.2.0

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
> Fix For: 2.1.0
>
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consum

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-07 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18745:

Target Version/s:   (was: 2.1.0)

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
> Fix For: 2.1.0
>
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-08 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-18745:
--
Fix Version/s: (was: 2.1.0)

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-14 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-18745:

Fix Version/s: (was: 2.1.1)
   2.1.0

> java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)
> -
>
> Key: SPARK-18745
> URL: https://issues.apache.org/jira/browse/SPARK-18745
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: JESSE CHEN
>Assignee: Kazuaki Ishizaki
>Priority: Critical
> Fix For: 2.0.3, 2.1.0
>
>
> Running query 68 with decreased executor memory (using 12GB executors instead 
> of 24GB) on 100TB parquet database using the Spark master dated 11/04 gave 
> IndexOutOfBoundsException.
> The query is as follows:
> {noformat}
> [select  c_last_name
>,c_first_name
>,ca_city
>,bought_city
>,ss_ticket_number
>,extended_price
>,extended_tax
>,list_price
>  from (select ss_ticket_number
>  ,ss_customer_sk
>  ,ca_city bought_city
>  ,sum(ss_ext_sales_price) extended_price 
>  ,sum(ss_ext_list_price) list_price
>  ,sum(ss_ext_tax) extended_tax 
>from store_sales
>,date_dim
>,store
>,household_demographics
>,customer_address 
>where store_sales.ss_sold_date_sk = date_dim.d_date_sk
>  and store_sales.ss_store_sk = store.s_store_sk  
> and store_sales.ss_hdemo_sk = household_demographics.hd_demo_sk
> and store_sales.ss_addr_sk = customer_address.ca_address_sk
> and date_dim.d_dom between 1 and 2 
> and (household_demographics.hd_dep_count = 8 or
>  household_demographics.hd_vehicle_count= -1)
> and date_dim.d_year in (2000,2000+1,2000+2)
> and store.s_city in ('Plainview','Rogers')
>group by ss_ticket_number
>,ss_customer_sk
>,ss_addr_sk,ca_city) dn
>   ,customer
>   ,customer_address current_addr
>  where ss_customer_sk = c_customer_sk
>and customer.c_current_addr_sk = current_addr.ca_address_sk
>and current_addr.ca_city <> bought_city
>  order by c_last_name
>  ,ss_ticket_number
>   limit 100]
> {noformat}
> Spark output that showed the exception:
> {noformat}
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at 
> org.apache.spark.util.ThreadUtils$.awaitResultInForkJoinSafely(ThreadUtils.scala:215)
>   at 
> org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.exchange.ReusedExchangeExec.doExecuteBroadcast(Exchange.scala:61)
>   at 
> org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:231)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:124)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:123)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
>   at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
>   at 
> org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
>   at 
> org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36)
>   at 
> org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:68)
>   at 
> org.apache.spark.sql.executio