from:"Xianyin Xin \(JIRA\)"

[jira] [Updated] (SPARK-32127) Check rules for MERGE INTO should use MergeAction.condition other than MergeAction.children

2020-06-29 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-32127:

Summary: Check rules for MERGE INTO should use MergeAction.condition other 
than MergeAction.children  (was: Check rules for MERGE INTO should use 
MergeAction.condition other than MeregAction.children)

> Check rules for MERGE INTO should use MergeAction.condition other than 
> MergeAction.children
> ---
>
> Key: SPARK-32127
> URL: https://issues.apache.org/jira/browse/SPARK-32127
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> [SPARK-30924|https://issues.apache.org/jira/browse/SPARK-30924] adds some 
> check rules for MERGE INTO one of which ensures the first MATCHED clause must 
> have a condition. However, it uses {{MergeAction.children}} in the checking 
> which is not accurate for the case, and it lets the below case pass the check:
> {code:scala}
> MERGE INTO testcat1.ns1.ns2.tbl AS target
> xxx
> WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
> WHEN MATCHED THEN DELETE
> xxx
> {code}
> We should use {{MergeAction.condition}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32127) Check rules for MERGE INTO should use MergeAction.condition other than MeregAction.children

2020-06-29 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-32127:

Description: 
[SPARK-30924|https://issues.apache.org/jira/browse/SPARK-30924] adds some check 
rules for MERGE INTO one of which ensures the first MATCHED clause must have a 
condition. However, it uses {{MergeAction.children}} in the checking which is 
not accurate for the case, and it lets the below case pass the check:

{code:scala}
MERGE INTO testcat1.ns1.ns2.tbl AS target
xxx
WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
WHEN MATCHED THEN DELETE
xxx
{code}

We should use {{MergeAction.condition}} instead.


  was:
[SPARK-30924|https://issues.apache.org/jira/browse/SPARK-30924] adds some check 
rules for MERGE INTO one of which ensures the first MATCHED clause must have a 
condition. However, it uses {MergeAction.children} in the checking which is not 
accurate for the case, and it lets the below case pass the check:

{code:scala}
MERGE INTO testcat1.ns1.ns2.tbl AS target
xxx
WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
WHEN MATCHED THEN DELETE
xxx
{code}

We should use {MergeAction.condition} instead.



> Check rules for MERGE INTO should use MergeAction.condition other than 
> MeregAction.children
> ---
>
> Key: SPARK-32127
> URL: https://issues.apache.org/jira/browse/SPARK-32127
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> [SPARK-30924|https://issues.apache.org/jira/browse/SPARK-30924] adds some 
> check rules for MERGE INTO one of which ensures the first MATCHED clause must 
> have a condition. However, it uses {{MergeAction.children}} in the checking 
> which is not accurate for the case, and it lets the below case pass the check:
> {code:scala}
> MERGE INTO testcat1.ns1.ns2.tbl AS target
> xxx
> WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
> WHEN MATCHED THEN DELETE
> xxx
> {code}
> We should use {{MergeAction.condition}} instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32127) Check rules for MERGE INTO should use MergeAction.condition other than MeregAction.children

2020-06-29 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-32127:
---

 Summary: Check rules for MERGE INTO should use 
MergeAction.condition other than MeregAction.children
 Key: SPARK-32127
 URL: https://issues.apache.org/jira/browse/SPARK-32127
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


[SPARK-30924|https://issues.apache.org/jira/browse/SPARK-30924] adds some check 
rules for MERGE INTO one of which ensures the first MATCHED clause must have a 
condition. However, it uses {MergeAction.children} in the checking which is not 
accurate for the case, and it lets the below case pass the check:

{code:scala}
MERGE INTO testcat1.ns1.ns2.tbl AS target
xxx
WHEN MATCHED THEN UPDATE SET target.col2 = source.col2
WHEN MATCHED THEN DELETE
xxx
{code}

We should use {MergeAction.condition} instead.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-32030:

Description: 
Now the {{MERGE INTO}} syntax is,
{code:sql}
MERGE INTO [db_name.]target_table [AS target_alias]
 USING [db_name.]source_table [] [AS source_alias]
 ON 
 [ WHEN MATCHED [ AND  ] THEN  ]
 [ WHEN MATCHED [ AND  ] THEN  ]
 [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
clauses in {{MERGE INTO}} statement, because users may want to deal with 
different "{{AND }}"s, the result of which just like a series of 
"{{CASE WHEN}}"s. The expected syntax looks like
{code:sql}
MERGE INTO [db_name.]target_table [AS target_alias]
 USING [db_name.]source_table [] [AS source_alias]
 ON 
 [when_clause [, ...]]
{code}

 where {{when_clause}} is 
{code:java}
WHEN MATCHED [ AND  ] THEN {code}
or
{code:java}
WHEN NOT MATCHED [ AND  ] THEN {code}
 

  was:
Now the MERGE INTO syntax is,
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN NOT MATCHED [ AND  ]  THEN  ]
```
It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO statement, because users may want to deal with different "AND 
"s, the result of which just like a series of "CASE WHEN"s. The 
expected syntax looks like
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[when_clause [, ...]]
```
where `when_clause` is 
```
WHEN MATCHED [ AND  ] THEN 
```
or
```
WHEN NOT MATCHED [ AND  ]  THEN 
```



> Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO
> ---
>
> Key: SPARK-32030
> URL: https://issues.apache.org/jira/browse/SPARK-32030
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the {{MERGE INTO}} syntax is,
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN MATCHED [ AND  ] THEN  ]
>  [ WHEN NOT MATCHED [ AND  ] THEN  ]{code}
> It would be nice if we support unlimited {{MATCHED}} and {{NOT MATCHED}} 
> clauses in {{MERGE INTO}} statement, because users may want to deal with 
> different "{{AND }}"s, the result of which just like a series of 
> "{{CASE WHEN}}"s. The expected syntax looks like
> {code:sql}
> MERGE INTO [db_name.]target_table [AS target_alias]
>  USING [db_name.]source_table [] [AS source_alias]
>  ON 
>  [when_clause [, ...]]
> {code}
>  where {{when_clause}} is 
> {code:java}
> WHEN MATCHED [ AND  ] THEN {code}
> or
> {code:java}
> WHEN NOT MATCHED [ AND  ] THEN {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32030) Support unlimited MATCHED and NOT MATCHED clauses in MERGE INTO

2020-06-19 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-32030:
---

 Summary: Support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO
 Key: SPARK-32030
 URL: https://issues.apache.org/jira/browse/SPARK-32030
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Xianyin Xin


Now the MERGE INTO syntax is,
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN MATCHED [ AND  ] THEN  ]
[ WHEN NOT MATCHED [ AND  ]  THEN  ]
```
It would be nice if we support unlimited MATCHED and NOT MATCHED clauses in 
MERGE INTO statement, because users may want to deal with different "AND 
"s, the result of which just like a series of "CASE WHEN"s. The 
expected syntax looks like
```
MERGE INTO [db_name.]target_table [AS target_alias]
USING [db_name.]source_table [] [AS source_alias]
ON 
[when_clause [, ...]]
```
where `when_clause` is 
```
WHEN MATCHED [ AND  ] THEN 
```
or
```
WHEN NOT MATCHED [ AND  ]  THEN 
```




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29907) Move DELETE/UPDATE/MERGE relative rules to dmlStatementNoWith to support cte.

2019-11-14 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-29907:
---

 Summary: Move DELETE/UPDATE/MERGE relative rules to 
dmlStatementNoWith to support cte.
 Key: SPARK-29907
 URL: https://issues.apache.org/jira/browse/SPARK-29907
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


SPARK-27444 introduced `dmlStatementNoWith` so that any dml that needs cte 
support can leverage it. It be better if we move DELETE/UPDATE/MERGE rules to 
`dmlStatementNoWith`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29835) Remove the unnecessary conversion from Statement to LogicalPlan for DELETE/UPDATE

2019-11-10 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29835:

Summary: Remove the unnecessary conversion from Statement to LogicalPlan 
for DELETE/UPDATE  (was: Remove the unneeded conversion from Statement to 
LogicalPlan for DELETE/UPDATE)

> Remove the unnecessary conversion from Statement to LogicalPlan for 
> DELETE/UPDATE
> -
>
> Key: SPARK-29835
> URL: https://issues.apache.org/jira/browse/SPARK-29835
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> The current parse and analyze flow for DELETE is: 1, the SQL string will be 
> firstly parsed to `DeleteFromStatement`; 2, the `DeleteFromStatement` be 
> converted to `DeleteFromTable`. However, the SQL string can be parsed to 
> `DeleteFromTable` directly, where a `DeleteFromStatement` seems to be 
> redundant.
> It is the same for UPDATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29835) Remove the unneeded conversion from Statement to LogicalPlan for DELETE/UPDATE

2019-11-10 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-29835:
---

 Summary: Remove the unneeded conversion from Statement to 
LogicalPlan for DELETE/UPDATE
 Key: SPARK-29835
 URL: https://issues.apache.org/jira/browse/SPARK-29835
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


The current parse and analyze flow for DELETE is: 1, the SQL string will be 
firstly parsed to `DeleteFromStatement`; 2, the `DeleteFromStatement` be 
converted to `DeleteFromTable`. However, the SQL string can be parsed to 
`DeleteFromTable` directly, where a `DeleteFromStatement` seems to be redundant.

It is the same for UPDATE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28303) Support DELETE/UPDATE/MERGE Operations in DataSource V2

2019-10-09 Thread Xianyin Xin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948160#comment-16948160
 ] 

Xianyin Xin commented on SPARK-28303:
-

[~echangzhang] Thank you for sharing your work with me. Seems our goals are 
similar but not exactly the same, and the approaches are different. In fact 
this ticket plans to add DELETE/UPDATE/MERGE support in Datasource V2, which 
can be implemented by the concrete datasource, like file based sources(parquet 
for example), kudu, and JDBC.

Does this meets your requirement?

> Support DELETE/UPDATE/MERGE Operations in DataSource V2
> ---
>
> Key: SPARK-28303
> URL: https://issues.apache.org/jira/browse/SPARK-28303
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
> supports deleting/updating data. It's necessary to add related APIs in the 
> datasource V2 API sets.
> For example, we suggest add the below interface in V2 API,
> {code:java|title=SupportsDelete.java|borderStyle=solid}
> public interface SupportsDelete {
>   WriteBuilder delete(Filter[] filters); 
> }
> {code}
> which can delete data by simple predicates (complicated cases like correlated 
> subquery is not considered currently).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-29420) support MERGE INTO in the parser and add the corresponding logical plan

2019-10-09 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin resolved SPARK-29420.
-
Resolution: Duplicate

Resolve this as duplicate. Move to [#SPARK-28893].

> support MERGE INTO in the parser and add the corresponding logical plan
> ---
>
> Key: SPARK-29420
> URL: https://issues.apache.org/jira/browse/SPARK-29420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28893) support MERGE INTO in the parser and add the corresponding logical plan

2019-10-09 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-28893:

Summary: support MERGE INTO in the parser and add the corresponding logical 
plan  (was: Support MERGE INTO in DataSource V2)

> support MERGE INTO in the parser and add the corresponding logical plan
> ---
>
> Key: SPARK-28893
> URL: https://issues.apache.org/jira/browse/SPARK-28893
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29420) support MERGE INTO in the parser and add the corresponding logical plan

2019-10-09 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-29420:
---

 Summary: support MERGE INTO in the parser and add the 
corresponding logical plan
 Key: SPARK-29420
 URL: https://issues.apache.org/jira/browse/SPARK-29420
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16931017#comment-16931017
 ] 

Xianyin Xin commented on SPARK-29049:
-

[~hyukjin.kwon], I updated the description. PR 
[https://github.com/apache/spark/pull/25626|https://github.com/apache/spark/pull/25626]
 will use it to normalize the attribute in {{Expression}}s.

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in {{Expression}}, not limit to {{Filter}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29049:

Description: DataSourceStrategy#normalizeFilters can also be used to 
normalize attributes in {{Expression}}, not limit to {{Filter}}.  (was: 
DataSourceStrategy#normalizeFilters can also be used to normalize attributes in 
\{{Expression}} s, not limit to \{{Filter}}s.)

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in {{Expression}}, not limit to {{Filter}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29049:

Description: DataSourceStrategy#normalizeFilters can also be used to 
normalize attributes in {{Expression}}s, not limit to {{Filter}}s.  (was: 
DataSourceStrategy#normalizeFilters can also be used to normalize attributes in 
`Expression`s, not limit to `Filter`s.)

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in {{Expression}}s, not limit to {{Filter}}s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29049:

Description: DataSourceStrategy#normalizeFilters can also be used to 
normalize attributes in {{Expression}} s, not limit to {{Filter}} s.  (was: 
DataSourceStrategy#normalizeFilters can also be used to normalize attributes in 
{{Expression}}s, not limit to {{Filter}}s.)

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in {{Expression}} s, not limit to {{Filter}} s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29049:

Description: DataSourceStrategy#normalizeFilters can also be used to 
normalize attributes in \{{Expression}} s, not limit to \{{Filter}}s.  (was: 
DataSourceStrategy#normalizeFilters can also be used to normalize attributes in 
{{Expression}} s, not limit to {{Filter}} s.)

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in \{{Expression}} s, not limit to \{{Filter}}s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-16 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-29049:

Description: DataSourceStrategy#normalizeFilters can also be used to 
normalize attributes in `Expression`s, not limit to `Filter`s.

> Rename DataSourceStrategy#normalizeFilters to 
> DataSourceStrategy#normalizeAttrNames
> ---
>
> Key: SPARK-29049
> URL: https://issues.apache.org/jira/browse/SPARK-29049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> DataSourceStrategy#normalizeFilters can also be used to normalize attributes 
> in `Expression`s, not limit to `Filter`s.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-29049) Rename DataSourceStrategy#normalizeFilters to DataSourceStrategy#normalizeAttrNames

2019-09-11 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-29049:
---

 Summary: Rename DataSourceStrategy#normalizeFilters to 
DataSourceStrategy#normalizeAttrNames
 Key: SPARK-29049
 URL: https://issues.apache.org/jira/browse/SPARK-29049
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28950) FollowingUp: Change whereClause to be optional in DELETE

2019-09-02 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-28950:
---

 Summary: FollowingUp: Change whereClause to be optional in DELETE
 Key: SPARK-28950
 URL: https://issues.apache.org/jira/browse/SPARK-28950
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28950) [SPARK-28351] FollowingUp: Change whereClause to be optional in DELETE

2019-09-02 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-28950:

Summary: [SPARK-28351] FollowingUp: Change whereClause to be optional in 
DELETE  (was: FollowingUp: Change whereClause to be optional in DELETE)

> [SPARK-28351] FollowingUp: Change whereClause to be optional in DELETE
> --
>
> Key: SPARK-28950
> URL: https://issues.apache.org/jira/browse/SPARK-28950
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28923) Deduplicate the codes 'multipartIdentifier' and 'identifierSeq'

2019-09-01 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin resolved SPARK-28923.
-
Resolution: Invalid

> Deduplicate the codes 'multipartIdentifier' and 'identifierSeq'
> ---
>
> Key: SPARK-28923
> URL: https://issues.apache.org/jira/browse/SPARK-28923
> Project: Spark
>  Issue Type: Request
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Minor
>
> In {{sqlbase.g4}}, {{multipartIdentifier}} and {{identifierSeq}} have the 
> same functionality. We'd better deduplicate them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28923) Deduplicate the codes 'multipartIdentifier' and 'identifierSeq'

2019-08-29 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-28923:
---

 Summary: Deduplicate the codes 'multipartIdentifier' and 
'identifierSeq'
 Key: SPARK-28923
 URL: https://issues.apache.org/jira/browse/SPARK-28923
 Project: Spark
  Issue Type: Request
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


In {{sqlbase.g4}}, {{multipartIdentifier}} and {{identifierSeq}} have the same 
functionality. We'd better deduplicate them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28893) Support MERGE INTO in DataSource V2

2019-08-28 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-28893:
---

 Summary: Support MERGE INTO in DataSource V2
 Key: SPARK-28893
 URL: https://issues.apache.org/jira/browse/SPARK-28893
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28892) Support UPDATE in DataSource V2

2019-08-28 Thread Xianyin Xin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-28892:

Affects Version/s: (was: 2.4.3)
   3.0.0

> Support UPDATE in DataSource V2
> ---
>
> Key: SPARK-28892
> URL: https://issues.apache.org/jira/browse/SPARK-28892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28892) Support UPDATE in DataSource V2

2019-08-28 Thread Xianyin Xin (Jira)

Xianyin Xin created SPARK-28892:
---

 Summary: Support UPDATE in DataSource V2
 Key: SPARK-28892
 URL: https://issues.apache.org/jira/browse/SPARK-28892
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.3
Reporter: Xianyin Xin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-07-15 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885019#comment-16885019
 ] 

Xianyin Xin commented on SPARK-21067:
-

[~dricard]，does the attached patch work in your case?

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0, 2.4.3
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at

[jira] [Updated] (SPARK-28303) Support DELETE/UPDATE/MERGE Operations in DataSource V2

2019-07-11 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-28303:

Description: 
Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
supports deleting/updating data. It's necessary to add related APIs in the 
datasource V2 API sets.

For example, we suggest add the below interface in V2 API,
{code:java|title=SupportsDelete.java|borderStyle=solid}
public interface SupportsDelete {
  WriteBuilder delete(Filter[] filters); 
}
{code}
which can delete data by simple predicates (complicated cases like correlated 
subquery is not considered currently).

 

  was:
Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
supports deleting/updating data. It's necessary to add related APIs in the 
datasource V2 API sets.

For example, we suggest add the below interface in V2 API,

{code:title=SupportsDelete.java|borderStyle=solid}
public interface SupportsDelete extends WriteBuilder {
  WriteBuilder delete(Filter[] filters); 
}
{code}

which can delete data by simple predicates (complicated cases like correlated 
subquery is not considered currently).

 


> Support DELETE/UPDATE/MERGE Operations in DataSource V2
> ---
>
> Key: SPARK-28303
> URL: https://issues.apache.org/jira/browse/SPARK-28303
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
> supports deleting/updating data. It's necessary to add related APIs in the 
> datasource V2 API sets.
> For example, we suggest add the below interface in V2 API,
> {code:java|title=SupportsDelete.java|borderStyle=solid}
> public interface SupportsDelete {
>   WriteBuilder delete(Filter[] filters); 
> }
> {code}
> which can delete data by simple predicates (complicated cases like correlated 
> subquery is not considered currently).
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28350) Support DELETE in DataSource V2

2019-07-11 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin resolved SPARK-28350.
-
Resolution: Duplicate

Resolve this, and create sub-task of 
[SPARK-28303|https://issues.apache.org/jira/browse/SPARK-28303] instead.

> Support DELETE in DataSource V2
> ---
>
> Key: SPARK-28350
> URL: https://issues.apache.org/jira/browse/SPARK-28350
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> This ticket add the DELETE support for V2 datasources.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28351) Support DELETE in DataSource V2

2019-07-11 Thread Xianyin Xin (JIRA)

Xianyin Xin created SPARK-28351:
---

 Summary: Support DELETE in DataSource V2
 Key: SPARK-28351
 URL: https://issues.apache.org/jira/browse/SPARK-28351
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


This ticket add the DELETE support for V2 datasources.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28350) Support DELETE in DataSource V2

2019-07-11 Thread Xianyin Xin (JIRA)

Xianyin Xin created SPARK-28350:
---

 Summary: Support DELETE in DataSource V2
 Key: SPARK-28350
 URL: https://issues.apache.org/jira/browse/SPARK-28350
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xianyin Xin


This ticket add the DELETE support for V2 datasources.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28303) Support DELETE/UPDATE/MERGE Operations in DataSource V2

2019-07-08 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880258#comment-16880258
 ] 

Xianyin Xin commented on SPARK-28303:
-

The SQL entrance is need also.

> Support DELETE/UPDATE/MERGE Operations in DataSource V2
> ---
>
> Key: SPARK-28303
> URL: https://issues.apache.org/jira/browse/SPARK-28303
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
> supports deleting/updating data. It's necessary to add related APIs in the 
> datasource V2 API sets.
> For example, we suggest add the below interface in V2 API,
> {code:title=SupportsDelete.java|borderStyle=solid}
> public interface SupportsDelete extends WriteBuilder {
>   WriteBuilder delete(Filter[] filters); 
> }
> {code}
> which can delete data by simple predicates (complicated cases like correlated 
> subquery is not considered currently).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28303) Support DELETE/UPDATE/MERGE Operations in DataSource V2

2019-07-08 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-28303:

Description: 
Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
supports deleting/updating data. It's necessary to add related APIs in the 
datasource V2 API sets.

For example, we suggest add the below interface in V2 API,

{code:title=SupportsDelete.java|borderStyle=solid}
public interface SupportsDelete extends WriteBuilder {
  WriteBuilder delete(Filter[] filters); 
}
{code}

which can delete data by simple predicates (complicated cases like correlated 
subquery is not considered currently).

 

  was:
Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
supports deleting/updating data. It's necessary to add related APIs in the 
datasource V2 API sets.

For example, we suggest add the below interface in V2 API,

```java

public interface SupportsDelete extends WriteBuilder {

  WriteBuilder delete(Filter[] filters);

}

```

which can delete data by simple predicates (complicated cases like correlated 
subquery is not considered currently).

 


> Support DELETE/UPDATE/MERGE Operations in DataSource V2
> ---
>
> Key: SPARK-28303
> URL: https://issues.apache.org/jira/browse/SPARK-28303
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
> supports deleting/updating data. It's necessary to add related APIs in the 
> datasource V2 API sets.
> For example, we suggest add the below interface in V2 API,
> {code:title=SupportsDelete.java|borderStyle=solid}
> public interface SupportsDelete extends WriteBuilder {
>   WriteBuilder delete(Filter[] filters); 
> }
> {code}
> which can delete data by simple predicates (complicated cases like correlated 
> subquery is not considered currently).
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28303) Support DELETE/UPDATE/MERGE Operations in DataSource V2

2019-07-08 Thread Xianyin Xin (JIRA)

Xianyin Xin created SPARK-28303:
---

 Summary: Support DELETE/UPDATE/MERGE Operations in DataSource V2
 Key: SPARK-28303
 URL: https://issues.apache.org/jira/browse/SPARK-28303
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Xianyin Xin


Now many datasources (delta, jdbc, hive with transaction support, kudu, etc) 
supports deleting/updating data. It's necessary to add related APIs in the 
datasource V2 API sets.

For example, we suggest add the below interface in V2 API,

```java

public interface SupportsDelete extends WriteBuilder {

  WriteBuilder delete(Filter[] filters);

}

```

which can delete data by simple predicates (complicated cases like correlated 
subquery is not considered currently).

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-06-16 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16865272#comment-16865272
 ] 

Xianyin Xin commented on SPARK-27714:
-

[~nkollar], sorry for the late reply. Yes, It's similar with the implementation 
in Postgres. However, It is not a replacement, or an alternative of current 
join reorder logic (DP), but a supplement of DP. DP is used when the number of 
joined table is small (<12 now in spark), while GA is used when the number of 
joined tables is large. Because as the number of joined table grows, DP would 
spend lots of time to find the best joined plan. GA can accelerates the "best 
plan searching" progress.

TPC-DS q64 is an example. Our experiment shows the executing time decreased 
from 1300+s to 200+s for 10TB TPC-DS q64, with a 18 nodes cluster.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15348) Hive ACID

2019-05-27 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848881#comment-16848881
 ] 

Xianyin Xin commented on SPARK-15348:
-

The starting point (or goal) of Delta Lake is not ACID, but "Data Lake", and 
ACID is just one of its features. The ACID designs between hive and delta is 
very different, both have pros and cons. However, hive table and delta table 
are two datasources in spark's perspective, so a pluggable ACID support for 
different datasources within one framework is a choice. Maybe datasource V2 API 
can handle this.

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27722) Remove UnsafeKeyValueSorter

2019-05-15 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840971#comment-16840971
 ] 

Xianyin Xin commented on SPARK-27722:
-

When doing the moving, I didn't find any reference of this class either.

> Remove UnsafeKeyValueSorter
> ---
>
> Key: SPARK-27722
> URL: https://issues.apache.org/jira/browse/SPARK-27722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> We just moved the location of classes including {{UnsafeKeyValueSorter}}. 
> After further investigating, I don't find where {{UnsafeKeyValueSorter}} is 
> used.
> If it is not used at all, shall we just remove it from codebase? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-15 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840370#comment-16840370
 ] 

Xianyin Xin commented on SPARK-27714:
-

[~hyukjin.kwon] Thanks for reminding.

[~hyukjin.kwon] [~viirya] Thank you for comments. I'm working on a doc, will 
post it later on.

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-14 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-27714:

Description: Now the join reorder logic is based on dynamic planning which 
can find the most optimized plan theoretically, but the searching cost grows 
rapidly with the # of joined tables grows. It would be better to introduce 
Genetic algorithm (GA) to overcome this problem.  (was: Now the join reorder 
logic is based on dynamic planning which can find the most optimized plan 
theoretically, but the searching cost grows rapidly with the # of joined tables 
grows. It would be better in introduce Genetic algorithm (GA) to overcome this 
problem.)

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better to introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27714) Support Join Reorder based on Genetic Algorithm when the # of joined tables > 12

2019-05-14 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-27714:

Summary: Support Join Reorder based on Genetic Algorithm when the # of 
joined tables > 12  (was: Support Join Reorder based on Genetic algorithm when 
the # of joined tables > 12)

> Support Join Reorder based on Genetic Algorithm when the # of joined tables > 
> 12
> 
>
> Key: SPARK-27714
> URL: https://issues.apache.org/jira/browse/SPARK-27714
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Xianyin Xin
>Priority: Major
>
> Now the join reorder logic is based on dynamic planning which can find the 
> most optimized plan theoretically, but the searching cost grows rapidly with 
> the # of joined tables grows. It would be better in introduce Genetic 
> algorithm (GA) to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27714) Support Join Reorder based on Genetic algorithm when the # of joined tables > 12

2019-05-14 Thread Xianyin Xin (JIRA)

Xianyin Xin created SPARK-27714:
---

 Summary: Support Join Reorder based on Genetic algorithm when the 
# of joined tables > 12
 Key: SPARK-27714
 URL: https://issues.apache.org/jira/browse/SPARK-27714
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Xianyin Xin


Now the join reorder logic is based on dynamic planning which can find the most 
optimized plan theoretically, but the searching cost grows rapidly with the # 
of joined tables grows. It would be better in introduce Genetic algorithm (GA) 
to overcome this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27713) Move RecordBinaryComparator and unsafe sorters from catalyst project to core

2019-05-14 Thread Xianyin Xin (JIRA)

Xianyin Xin created SPARK-27713:
---

 Summary: Move RecordBinaryComparator and unsafe sorters from 
catalyst project to core
 Key: SPARK-27713
 URL: https://issues.apache.org/jira/browse/SPARK-27713
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.3
Reporter: Xianyin Xin


`RecordBinaryComparator`, `UnsafeExternalRowSorter` and `UnsafeKeyValueSorter` 
now locates in catalyst, which should be moved to core, as they're used only in 
physical plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-03-11 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790119#comment-16790119
 ] 

Xianyin Xin commented on SPARK-21067:
-

Yep, [~Moriarty279] , nice analysis.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0, 2.4.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at

[jira] [Commented] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-02-21 Thread Xianyin Xin (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774675#comment-16774675
 ] 

Xianyin Xin commented on SPARK-21067:
-

Upload a patch which is based on 2.3.2.

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at

[jira] [Updated] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source

2019-02-21 Thread Xianyin Xin (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianyin Xin updated SPARK-21067:

Attachment: SPARK-21067.patch

> Thrift Server - CTAS fail with Unable to move source
> 
>
> Key: SPARK-21067
> URL: https://issues.apache.org/jira/browse/SPARK-21067
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
> Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>Reporter: Dominic Ricard
>Priority: Major
> Attachments: SPARK-21067.patch
>
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS 
> would fail, sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift 
> server. After that, dropping the table and re-issuing the same CTAS would 
> fail with the following message (Sometime, it fails right away, sometime it 
> work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> We have already found the following Jira 
> (https://issues.apache.org/jira/browse/SPARK-11021) which state that the 
> {{hive.exec.stagingdir}} had to be added in order for Spark to be able to 
> handle CREATE TABLE properly as of 2.0. As you can see in the error, we have 
> ours set to "/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE 
> dricard.test SELECT 1;
> Error: org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-1/part-0
>  to destination 
> hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0; 
> (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production 
> Environment but since 2.0+, we haven't been able to CREATE TABLE consistently 
> on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't 
> encounter the same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
> executing query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source 
> hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-1/part-0
>  to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-0;
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
> at 
> org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
> at 
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
> at org.apache.spark.sql.Dataset.(Dataset.scala:185)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
> at

45 matches

Mail list logo