[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

2017-10-28 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16223667#comment-16223667
 ] 

Wenchen Fan commented on SPARK-21158:
-

I think this is a reasonable feature request, i.e. making 
{{Catalog.listTables}} case preserving. However it needs to change how Spark 
SQL implements case sensitivity, which is really a big change. I'd like to mark 
this ticket as "later" because the benefit is small here and we may not have 
time to do it recently. Any objections? cc [~smilegator] [~srowen]

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark 
> setting for case-sensitivity
> --
>
> Key: SPARK-21158
> URL: https://issues.apache.org/jira/browse/SPARK-21158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Windows 10
> IntelliJ 
> Scala
>Reporter: Kathryn McClintic
>Priority: Minor
>  Labels: easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues 
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the 
> table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the 
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in 
> our DB. We check if this list contains(<"string of table name">) to determine 
> if we have already created a table. If case-sensitivity is turned off 
> (false), this function should look if the table name is contained in the 
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of 
> the stored table. Therefore, if you pass in a camel or upper case table name, 
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function 
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have 
> spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the 
> temp join table of 'carnamesandmodels'. However, you have stored it as a 
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will 
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be 
> lowercase should not be put on the spark user. This should be done by spark 
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> - Setting case sensitive in the sql context should make the sql context 
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts 
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the 
> input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

2017-06-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16063561#comment-16063561
 ] 

Apache Spark commented on SPARK-21158:
--

User 'cammachusa' has created a pull request for this issue:
https://github.com/apache/spark/pull/18423

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark 
> setting for case-sensitivity
> --
>
> Key: SPARK-21158
> URL: https://issues.apache.org/jira/browse/SPARK-21158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Windows 10
> IntelliJ 
> Scala
>Reporter: Kathryn McClintic
>Priority: Minor
>  Labels: easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues 
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the 
> table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the 
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in 
> our DB. We check if this list contains(<"string of table name">) to determine 
> if we have already created a table. If case-sensitivity is turned off 
> (false), this function should look if the table name is contained in the 
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of 
> the stored table. Therefore, if you pass in a camel or upper case table name, 
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function 
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have 
> spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the 
> temp join table of 'carnamesandmodels'. However, you have stored it as a 
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will 
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be 
> lowercase should not be put on the spark user. This should be done by spark 
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> - Setting case sensitive in the sql context should make the sql context 
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts 
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the 
> input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

2017-06-21 Thread Kathryn McClintic (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058577#comment-16058577
 ] 

Kathryn McClintic commented on SPARK-21158:
---

I'm fine with that from my perspective.

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark 
> setting for case-sensitivity
> --
>
> Key: SPARK-21158
> URL: https://issues.apache.org/jira/browse/SPARK-21158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Windows 10
> IntelliJ 
> Scala
>Reporter: Kathryn McClintic
>Priority: Minor
>  Labels: easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues 
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the 
> table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the 
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in 
> our DB. We check if this list contains(<"string of table name">) to determine 
> if we have already created a table. If case-sensitivity is turned off 
> (false), this function should look if the table name is contained in the 
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of 
> the stored table. Therefore, if you pass in a camel or upper case table name, 
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function 
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have 
> spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the 
> temp join table of 'carnamesandmodels'. However, you have stored it as a 
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will 
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be 
> lowercase should not be put on the spark user. This should be done by spark 
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> - Setting case sensitive in the sql context should make the sql context 
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts 
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the 
> input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

2017-06-20 Thread Cam Quoc Mach (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16056758#comment-16056758
 ] 

Cam Quoc Mach commented on SPARK-21158:
---

Can I take this task?

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark 
> setting for case-sensitivity
> --
>
> Key: SPARK-21158
> URL: https://issues.apache.org/jira/browse/SPARK-21158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: Windows 10
> IntelliJ 
> Scala
>Reporter: Kathryn McClintic
>Priority: Minor
>  Labels: easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues 
> with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the 
> table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the 
> table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in 
> our DB. We check if this list contains(<"string of table name">) to determine 
> if we have already created a table. If case-sensitivity is turned off 
> (false), this function should look if the table name is contained in the 
> table list regardless of case.
> However, it tries to look for only ones that match the lower case version of 
> the stored table. Therefore, if you pass in a camel or upper case table name, 
> this function would return false when in fact the table does exist.
> The root cause of this issue is in the function 
> SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have 
> spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the 
> temp join table of 'carnamesandmodels'. However, you have stored it as a 
> constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will 
> currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be 
> lowercase should not be put on the spark user. This should be done by spark 
> sql if case-sensitivity is turned to false.
> Proposed solutions:
> - Setting case sensitive in the sql context should make the sql context 
> be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts 
> the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the 
> input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org