[jira] [Commented] (SPARK-30442) Write mode ignored when using CodecStreams

2020-02-27 Thread Abhishek Madav (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047024#comment-17047024
 ] 

Abhishek Madav commented on SPARK-30442:


In case of task failures, say the task fails to write to local-disk or is 
interrupted, the file is empty but materialized on the file-system. The next 
task which retries to write to this location would see the file and return a 
FileAlreadyExistException. Thus making it not resilient to task-failures.

> Write mode ignored when using CodecStreams
> --
>
> Key: SPARK-30442
> URL: https://issues.apache.org/jira/browse/SPARK-30442
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.4
>Reporter: Jesse Collins
>Priority: Major
>
> Overwrite is hardcoded to false in the codec stream. This can cause issues, 
> particularly with aws tools, that make it impossible to retry.
> Ideally, this should be read from the write mode set for the DataWriter that 
> is writing through this codec class.
> [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view

2018-07-20 Thread Abhishek Madav (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551296#comment-16551296
 ] 

Abhishek Madav commented on SPARK-24864:


Thanks for the reply. The views are currently crated by the customer and the 
spark-job hasn't been able to keep up with the upgrade from 1.6 -> 2.0+ hence 
they feel it is a regression. Is there anything that can be done to go back to 
the 1.6 way of column referencing?

> Cannot resolve auto-generated column ordinals in a hive view
> 
>
> Key: SPARK-24864
> URL: https://issues.apache.org/jira/browse/SPARK-24864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Abhishek Madav
>Priority: Major
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from vsrc1new").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
> input columns: [id, upper(name)]; line 1 pos 24;
> 'Project [*]
> +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
>    +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
>   +- SubqueryAlias vsrc1new
>  +- Project [id#634, upper(name#635) AS upper(name)#636]
>     +- MetastoreRelation default, src1
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
> {code}
> *Steps to reproduce:*
> 1: Create a simple table, say src
> {code:java}
> CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ','
> {code}
> 2: Create a view, say with name vsrc1new
> {code:java}
> CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, 
> upper(name) FROM src1) vsrc1new;
> {code}
> 3. Selecting data from this view in hive-cli/beeline doesn't cause any error.
> 4. Creating a dataframe using:
> {code:java}
> spark.sql("Select * from vsrc1new").show //throws error
> {code}
> The auto-generated column names for the view are not resolved. Am I possibly 
> missing some spark-sql configuration here? I tried the repro-case against 
> spark 1.6 and that worked fine. Any inputs are appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view

2018-07-19 Thread Abhishek Madav (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-24864:
---
Description: 
Spark job reading from a hive-view fails with analysis exception when resolving 
column ordinals which are autogenerated.

*Exception*:
{code:java}
scala> spark.sql("Select * from vsrc1new").show
org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
input columns: [id, upper(name)]; line 1 pos 24;
'Project [*]
+- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
   +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
  +- SubqueryAlias vsrc1new
 +- Project [id#634, upper(name#635) AS upper(name)#636]
    +- MetastoreRelation default, src1

  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
{code}
*Steps to reproduce:*

1: Create a simple table, say src
{code:java}
CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ','
{code}
2: Create a view, say with name vsrc1new
{code:java}
CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, upper(name) 
FROM src1) vsrc1new;
{code}
3. Selecting data from this view in hive-cli/beeline doesn't cause any error.

4. Creating a dataframe using:
{code:java}
spark.sql("Select * from vsrc1new").show //throws error
{code}
The auto-generated column names for the view are not resolved. Am I possibly 
missing some spark-sql configuration here? I tried the repro-case against spark 
1.6 and that worked fine. Any inputs are appreciated.

  was:
Spark job reading from a hive-view fails with analysis exception when resolving 
column ordinals which are autogenerated.

*Exception*:
{code:java}
scala> spark.sql("Select * from vsrc1new").show
org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
input columns: [id, upper(name)]; line 1 pos 24;
'Project [*]
+- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
   +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
  +- SubqueryAlias vsrc1new
 +- Project [id#634, upper(name#635) AS upper(name)#636]
    +- MetastoreRelation default, src1

  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
{code}
Steps to reproduce:

1: Create a simple table, say src
{code:java}
CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ','
{code}
2: Create a view, say with name vsrc1new
{code:java}
CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, upper(name) 
FROM src1) vsrc1new;
{code}
3. Selecting data from this view in hive-cli/beeline doesn't cause any error.

4. Creating a dataframe using:
{code:java}
spark.sql("Select * from vsrc1new").show //throws error
{code}
The auto-generated column names for the view are not resolved. Am I possibly 
missing some spark-sql configuration here? I tried the repro-case against spark 
1.6 and that worked fine. Any inputs are appreciated.


> Cannot resolve auto-generated column ordinals in a hive view
> 
>
> Key: SPARK-24864
> URL: https://issues.apache.org/jira/browse/SPARK-24864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Abhishek Madav
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from 

[jira] [Updated] (SPARK-24864) Cannot resolve auto-generated column ordinals in a hive view

2018-07-19 Thread Abhishek Madav (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-24864:
---
Summary: Cannot resolve auto-generated column ordinals in a hive view  
(was: Cannot reference auto-generated column ordinals in a hive view)

> Cannot resolve auto-generated column ordinals in a hive view
> 
>
> Key: SPARK-24864
> URL: https://issues.apache.org/jira/browse/SPARK-24864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Abhishek Madav
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from vsrc1new").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
> input columns: [id, upper(name)]; line 1 pos 24;
> 'Project [*]
> +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
>    +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
>   +- SubqueryAlias vsrc1new
>  +- Project [id#634, upper(name#635) AS upper(name)#636]
>     +- MetastoreRelation default, src1
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
> {code}
> Steps to reproduce:
> 1: Create a simple table, say src
> {code:java}
> CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ','
> {code}
> 2: Create a view, say with name vsrc1new
> {code:java}
> CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, 
> upper(name) FROM src1) vsrc1new;
> {code}
> 3. Selecting data from this view in hive-cli/beeline doesn't cause any error.
> 4. Creating a dataframe using:
> {code:java}
> spark.sql("Select * from vsrc1new").show //throws error
> {code}
> The auto-generated column names for the view are not resolved. Am I possibly 
> missing some spark-sql configuration here? I tried the repro-case against 
> spark 1.6 and that worked fine. Any inputs are appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24864) Cannot reference auto-generated column ordinals in a hive view

2018-07-19 Thread Abhishek Madav (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-24864:
---
Summary: Cannot reference auto-generated column ordinals in a hive view  
(was: Cannot reference auto-generated column ordinals in a hive-view. )

> Cannot reference auto-generated column ordinals in a hive view
> --
>
> Key: SPARK-24864
> URL: https://issues.apache.org/jira/browse/SPARK-24864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Abhishek Madav
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from vsrc1new").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
> input columns: [id, upper(name)]; line 1 pos 24;
> 'Project [*]
> +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
>    +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
>   +- SubqueryAlias vsrc1new
>  +- Project [id#634, upper(name#635) AS upper(name)#636]
>     +- MetastoreRelation default, src1
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
> {code}
> Steps to reproduce:
> 1: Create a simple table, say src
> {code:java}
> CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ','
> {code}
> 2: Create a view, say with name vsrc1new
> {code:java}
> CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, 
> upper(name) FROM src1) vsrc1new;
> {code}
> 3. Selecting data from this view in hive-cli/beeline doesn't cause any error.
> 4. Creating a dataframe using:
> {code:java}
> spark.sql("Select * from vsrc1new").show //throws error
> {code}
> The auto-generated column names for the view are not resolved. Am I possibly 
> missing some spark-sql configuration here? I tried the repro-case against 
> spark 1.6 and that worked fine. Any inputs are appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24864) Cannot reference auto-generated column ordinals in a hive-view.

2018-07-19 Thread Abhishek Madav (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-24864:
---
Description: 
Spark job reading from a hive-view fails with analysis exception when resolving 
column ordinals which are autogenerated.

*Exception*:
{code:java}
scala> spark.sql("Select * from vsrc1new").show
org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
input columns: [id, upper(name)]; line 1 pos 24;
'Project [*]
+- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
   +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
  +- SubqueryAlias vsrc1new
 +- Project [id#634, upper(name#635) AS upper(name)#636]
    +- MetastoreRelation default, src1

  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
{code}
Steps to reproduce:

1: Create a simple table, say src
{code:java}
CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY ','
{code}
2: Create a view, say with name vsrc1new
{code:java}
CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, upper(name) 
FROM src1) vsrc1new;
{code}
3. Selecting data from this view in hive-cli/beeline doesn't cause any error.

4. Creating a dataframe using:
{code:java}
spark.sql("Select * from vsrc1new").show //throws error
{code}
The auto-generated column names for the view are not resolved. Am I possibly 
missing some spark-sql configuration here? I tried the repro-case against spark 
1.6 and that worked fine. Any inputs are appreciated.

> Cannot reference auto-generated column ordinals in a hive-view. 
> 
>
> Key: SPARK-24864
> URL: https://issues.apache.org/jira/browse/SPARK-24864
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0
>Reporter: Abhishek Madav
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark job reading from a hive-view fails with analysis exception when 
> resolving column ordinals which are autogenerated.
> *Exception*:
> {code:java}
> scala> spark.sql("Select * from vsrc1new").show
> org.apache.spark.sql.AnalysisException: cannot resolve '`vsrc1new._c1`' given 
> input columns: [id, upper(name)]; line 1 pos 24;
> 'Project [*]
> +- 'SubqueryAlias vsrc1new, `default`.`vsrc1new`
>    +- 'Project [id#634, 'vsrc1new._c1 AS uname#633]
>   +- SubqueryAlias vsrc1new
>  +- Project [id#634, upper(name#635) AS upper(name)#636]
>     +- MetastoreRelation default, src1
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:310)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
> {code}
> Steps to reproduce:
> 1: Create a simple table, say src
> {code:java}
> CREATE TABLE `src1`(`id` int,  `name` string) ROW FORMAT DELIMITED FIELDS 
> TERMINATED BY ','
> {code}
> 2: Create a view, say with name vsrc1new
> {code:java}
> CREATE VIEW vsrc1new AS SELECT id, `_c1` AS uname FROM (SELECT id, 
> upper(name) FROM src1) vsrc1new;
> {code}
> 3. Selecting data from this view in hive-cli/beeline doesn't cause any error.
> 4. Creating a dataframe using:
> {code:java}
> spark.sql("Select * from vsrc1new").show //throws error
> {code}
> The auto-generated column names for the view are not resolved. Am I possibly 
> missing some spark-sql configuration here? I tried the repro-case against 
> spark 1.6 and that worked fine. Any inputs are appreciated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SPARK-24864) Cannot reference auto-generated column ordinals in a hive-view.

2018-07-19 Thread Abhishek Madav (JIRA)
Abhishek Madav created SPARK-24864:
--

 Summary: Cannot reference auto-generated column ordinals in a 
hive-view. 
 Key: SPARK-24864
 URL: https://issues.apache.org/jira/browse/SPARK-24864
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.1
Reporter: Abhishek Madav
 Fix For: 2.4.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

2018-03-20 Thread Abhishek Madav (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-20697:
---
Priority: Critical  (was: Major)

> MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
> --
>
> Key: SPARK-20697
> URL: https://issues.apache.org/jira/browse/SPARK-20697
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0
>Reporter: Abhishek Madav
>Priority: Critical
>
> MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
> does not restore the bucketing information to the storage descriptor in the 
> metastore. 
> Steps to reproduce:
> 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
> PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',';
> 2) In Hive-CLI issue a desc formatted for the table.
> # col_namedata_type   comment 
>
> a int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> b int 
>
> # Detailed Table Information   
> Database: sparkhivebucket  
> Owner:devbld   
> CreateTime:   Wed May 10 10:31:07 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:8020/user/hive/warehouse/partbucket 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   transient_lastDdlTime   1494437467  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  10   
> Bucket Columns:   [a]  
> Sort Columns: []   
> Storage Desc Params:   
>   field.delim ,   
>   serialization.format, 
> 3) In spark-shell, 
> scala> spark.sql("MSCK REPAIR TABLE partbucket")
> 4) Back to Hive-CLI 
> desc formatted partbucket;
> # col_namedata_type   comment 
>
> a int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> b int 
>
> # Detailed Table Information   
> Database: sparkhivebucket  
> Owner:devbld   
> CreateTime:   Wed May 10 10:31:07 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: 
> hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   spark.sql.partitionProvider catalog 
>   transient_lastDdlTime   1494437647  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   field.delim ,   
>   serialization.format, 
> Further inserts to this table cannot be made in bucketed fashion through 
> Hive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

2018-03-20 Thread Abhishek Madav (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-20697:
---
Affects Version/s: 2.2.0
   2.2.1
   2.3.0

> MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.
> --
>
> Key: SPARK-20697
> URL: https://issues.apache.org/jira/browse/SPARK-20697
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0, 2.2.1, 2.3.0
>Reporter: Abhishek Madav
>Priority: Major
>
> MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
> does not restore the bucketing information to the storage descriptor in the 
> metastore. 
> Steps to reproduce:
> 1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
> PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
> FIELDS TERMINATED BY ',';
> 2) In Hive-CLI issue a desc formatted for the table.
> # col_namedata_type   comment 
>
> a int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> b int 
>
> # Detailed Table Information   
> Database: sparkhivebucket  
> Owner:devbld   
> CreateTime:   Wed May 10 10:31:07 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: hdfs://localhost:8020/user/hive/warehouse/partbucket 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   transient_lastDdlTime   1494437467  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  10   
> Bucket Columns:   [a]  
> Sort Columns: []   
> Storage Desc Params:   
>   field.delim ,   
>   serialization.format, 
> 3) In spark-shell, 
> scala> spark.sql("MSCK REPAIR TABLE partbucket")
> 4) Back to Hive-CLI 
> desc formatted partbucket;
> # col_namedata_type   comment 
>
> a int 
>
> # Partition Information
> # col_namedata_type   comment 
>
> b int 
>
> # Detailed Table Information   
> Database: sparkhivebucket  
> Owner:devbld   
> CreateTime:   Wed May 10 10:31:07 PDT 2017 
> LastAccessTime:   UNKNOWN  
> Protect Mode: None 
> Retention:0
> Location: 
> hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
> Table Type:   MANAGED_TABLE
> Table Parameters:  
>   spark.sql.partitionProvider catalog 
>   transient_lastDdlTime   1494437647  
>
> # Storage Information  
> SerDe Library:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
>  
> InputFormat:  org.apache.hadoop.mapred.TextInputFormat 
> OutputFormat: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
> Compressed:   No   
> Num Buckets:  -1   
> Bucket Columns:   []   
> Sort Columns: []   
> Storage Desc Params:   
>   field.delim ,   
>   serialization.format, 
> Further inserts to this table cannot be made in bucketed fashion through 
> Hive. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

2017-05-10 Thread Abhishek Madav (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Madav updated SPARK-20697:
---
Description: 
MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
does not restore the bucketing information to the storage descriptor in the 
metastore. 

Steps to reproduce:
1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

2) In Hive-CLI issue a desc formatted for the table.

# col_name  data_type   comment 
 
a   int 
 
# Partition Information  
# col_name  data_type   comment 
 
b   int 
 
# Detailed Table Information 
Database:   sparkhivebucket  
Owner:  devbld   
CreateTime: Wed May 10 10:31:07 PDT 2017 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://localhost:8020/user/hive/warehouse/partbucket 
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime   1494437467  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:10   
Bucket Columns: [a]  
Sort Columns:   []   
Storage Desc Params: 
field.delim ,   
serialization.format, 

3) In spark-shell, 

scala> spark.sql("MSCK REPAIR TABLE partbucket")

4) Back to Hive-CLI 

desc formatted partbucket;

# col_name  data_type   comment 
 
a   int 
 
# Partition Information  
# col_name  data_type   comment 
 
b   int 
 
# Detailed Table Information 
Database:   sparkhivebucket  
Owner:  devbld   
CreateTime: Wed May 10 10:31:07 PDT 2017 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   
hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
Table Type: MANAGED_TABLE
Table Parameters:
spark.sql.partitionProvider catalog 
transient_lastDdlTime   1494437647  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
field.delim ,   
serialization.format, 


Further inserts to this table cannot be made in bucketed fashion through Hive. 

  was:
MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
does not restore the bucketing information to the storage descriptor in the 
metastore. 

Steps to reproduce:
1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

2) In Hive-CLI issue a desc formatted for the table.

# col_name  data_type   comment 
 
a   int 
 
# Partition Information  
# col_name  data_type   comment 
 
b   int 
 
# Detailed Table Information 
Database:   sparkhivebucket  
Owner:   

[jira] [Created] (SPARK-20697) MSCK REPAIR TABLE resets the Storage Information for bucketed hive tables.

2017-05-10 Thread Abhishek Madav (JIRA)
Abhishek Madav created SPARK-20697:
--

 Summary: MSCK REPAIR TABLE resets the Storage Information for 
bucketed hive tables.
 Key: SPARK-20697
 URL: https://issues.apache.org/jira/browse/SPARK-20697
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Abhishek Madav


MSCK REPAIR TABLE used to recover partitions for a partitioned+bucketed table 
does not restore the bucketing information to the storage descriptor in the 
metastore. 

Steps to reproduce:
1) Create a paritioned+bucketed table in hive: CREATE TABLE partbucket(a int) 
PARTITIONED BY (b int) CLUSTERED BY (a) INTO 10 BUCKETS ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',';

2) In Hive-CLI issue a desc formatted for the table.

# col_name  data_type   comment 
 
a   int 
 
# Partition Information  
# col_name  data_type   comment 
 
b   int 
 
# Detailed Table Information 
Database:   sparkhivebucket  
Owner:  devbld   
CreateTime: Wed May 10 10:31:07 PDT 2017 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   hdfs://localhost:8020/user/hive/warehouse/partbucket 
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime   1494437467  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:10   
Bucket Columns: [a]  
Sort Columns:   []   
Storage Desc Params: 
field.delim ,   
serialization.format, 

3) In spark-shell, 

scala> spark.sql("MSCK REPAIR TABLE partbucket")

4) Back to Hive-CLI 

desc formatted partbucket;

# col_name  data_type   comment 
 
a   int 
 
# Partition Information  
# col_name  data_type   comment 
 
b   int 
 
# Detailed Table Information 
Database:   sparkhivebucket  
Owner:  devbld   
CreateTime: Wed May 10 10:31:07 PDT 2017 
LastAccessTime: UNKNOWN  
Protect Mode:   None 
Retention:  0
Location:   
hdfs://localhost:8020/user/hive/warehouse/sparkhivebucket.db/partbucket 
Table Type: MANAGED_TABLE
Table Parameters:
spark.sql.partitionProvider catalog 
transient_lastDdlTime   1494437647  
 
# Storage Information
SerDe Library:  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  
 
InputFormat:org.apache.hadoop.mapred.TextInputFormat 
OutputFormat:   
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed: No   
Num Buckets:-1   
Bucket Columns: []   
Sort Columns:   []   
Storage Desc Params: 
field.delim ,   
serialization.format, 






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19532) [Core]`DataStreamer for file` threads of DFSOutputStream leak if set `spark.speculation` to true

2017-05-05 Thread Abhishek Madav (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999016#comment-15999016
 ] 

Abhishek Madav commented on SPARK-19532:


I am running into this issue wherein codepath similar to hiveWriterContainer is 
trying to the HDFS location. I tried setting spark.speculation to false but it 
doesn't seem to be the issue. Is there any workaround? This wait-time leads to 
make the job run real slow. 



> [Core]`DataStreamer for file` threads of DFSOutputStream leak if set 
> `spark.speculation` to true
> 
>
> Key: SPARK-19532
> URL: https://issues.apache.org/jira/browse/SPARK-19532
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.1.0
>Reporter: StanZhai
>Priority: Critical
>
> When set `spark.speculation` to true, from thread dump page of Executor of 
> WebUI, I found that there are about 1300 threads named  "DataStreamer for 
> file 
> /test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_69_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet"
>  in TIMED_WAITING state.
> {code}
> java.lang.Object.wait(Native Method)
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
> {code}
> The off-heap memory exceeds a lot until Executor exited with OOM exception. 
> This problem occurs only when writing data to the Hadoop(tasks may be killed 
> by Executor during writing).
> Could this be related to [https://issues.apache.org/jira/browse/HDFS-9812]? 
> The version of Hadoop is 2.6.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17302) Cannot set non-Spark SQL session variables in hive-site.xml, spark-defaults.conf, or using --conf

2017-02-16 Thread Abhishek Madav (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870616#comment-15870616
 ] 

Abhishek Madav commented on SPARK-17302:


I believe this is fixed as part of SPARK-15887. Could you check? 

> Cannot set non-Spark SQL session variables in hive-site.xml, 
> spark-defaults.conf, or using --conf
> -
>
> Key: SPARK-17302
> URL: https://issues.apache.org/jira/browse/SPARK-17302
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Ryan Blue
>
> When configuration changed for 2.0 to the new SparkSession structure, Spark 
> stopped using Hive's internal HiveConf for session state and now uses 
> HiveSessionState and an associated SQLConf. Now, session options like 
> hive.exec.compress.output and hive.exec.dynamic.partition.mode are pulled 
> from this SQLConf. This doesn't include session properties from hive-site.xml 
> (including hive.exec.compress.output), and no longer contains Spark-specific 
> overrides from spark-defaults.conf that used the spark.hadoop.hive... pattern.
> Also, setting these variables on the command-line no longer works because 
> settings must start with "spark.".
> Is there a recommended way to set Hive session properties?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org