[jira] [Updated] (SPARK-42752) Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop Free" distibution

2023-03-15 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42752:
--
Affects Version/s: (was: 3.1.3)
   (was: 3.2.4)
   (was: 3.4.1)
   (was: 3.3.3)

> Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop 
> Free" distibution
> ---
>
> Key: SPARK-42752
> URL: https://issues.apache.org/jira/browse/SPARK-42752
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.5.0
> Environment: local
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
> Fix For: 3.5.0
>
>
> Reproduction steps:
> 1. download a standard "Hadoop Free" build
> 2. Start pyspark REPL with Hive support
> {code:java}
> SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
> ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
> spark.sql.catalogImplementation=hive
> {code}
> 3. Execute any simple dataframe operation
> {code:java}
> >>> spark.range(100).show()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py",
>  line 416, in range
> jdf = self._jsparkSession.range(0, int(start), int(step), 
> int(numPartitions))
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
>  line 1321, in __call__
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
> line 117, in deco
> raise converted from None
> pyspark.sql.utils.IllegalArgumentException: 
> {code}
> 4. In fact you can just call spark.conf to trigger this issue
> {code:java}
> >>> spark.conf
> Traceback (most recent call last):
>   File "", line 1, in 
> ...
> {code}
> There are probably two issues here:
> 1) that Hive support should be gracefully disabled if it the dependency not 
> on the classpath as claimed by 
> https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
> 2) but at the very least the user should be able to see the exception to 
> understand the issue, and take an action
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42752) Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop Free" distibution

2023-03-14 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42752:
-
Priority: Minor  (was: Major)

> Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop 
> Free" distibution
> ---
>
> Key: SPARK-42752
> URL: https://issues.apache.org/jira/browse/SPARK-42752
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.3, 3.2.4, 3.3.3, 3.4.1, 3.5.0
> Environment: local
>Reporter: Gera Shegalov
>Priority: Minor
>
> Reproduction steps:
> 1. download a standard "Hadoop Free" build
> 2. Start pyspark REPL with Hive support
> {code:java}
> SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
> ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
> spark.sql.catalogImplementation=hive
> {code}
> 3. Execute any simple dataframe operation
> {code:java}
> >>> spark.range(100).show()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py",
>  line 416, in range
> jdf = self._jsparkSession.range(0, int(start), int(step), 
> int(numPartitions))
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
>  line 1321, in __call__
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
> line 117, in deco
> raise converted from None
> pyspark.sql.utils.IllegalArgumentException: 
> {code}
> 4. In fact you can just call spark.conf to trigger this issue
> {code:java}
> >>> spark.conf
> Traceback (most recent call last):
>   File "", line 1, in 
> ...
> {code}
> There are probably two issues here:
> 1) that Hive support should be gracefully disabled if it the dependency not 
> on the classpath as claimed by 
> https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
> 2) but at the very least the user should be able to see the exception to 
> understand the issue, and take an action
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42752) Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop Free" distibution

2023-03-14 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42752:
-
Issue Type: Improvement  (was: Bug)

> Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop 
> Free" distibution
> ---
>
> Key: SPARK-42752
> URL: https://issues.apache.org/jira/browse/SPARK-42752
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.1.3, 3.2.4, 3.3.3, 3.4.1, 3.5.0
> Environment: local
>Reporter: Gera Shegalov
>Priority: Major
>
> Reproduction steps:
> 1. download a standard "Hadoop Free" build
> 2. Start pyspark REPL with Hive support
> {code:java}
> SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
> ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
> spark.sql.catalogImplementation=hive
> {code}
> 3. Execute any simple dataframe operation
> {code:java}
> >>> spark.range(100).show()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py",
>  line 416, in range
> jdf = self._jsparkSession.range(0, int(start), int(step), 
> int(numPartitions))
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
>  line 1321, in __call__
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
> line 117, in deco
> raise converted from None
> pyspark.sql.utils.IllegalArgumentException: 
> {code}
> 4. In fact you can just call spark.conf to trigger this issue
> {code:java}
> >>> spark.conf
> Traceback (most recent call last):
>   File "", line 1, in 
> ...
> {code}
> There are probably two issues here:
> 1) that Hive support should be gracefully disabled if it the dependency not 
> on the classpath as claimed by 
> https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
> 2) but at the very least the user should be able to see the exception to 
> understand the issue, and take an action
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42752) Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop Free" distibution

2023-03-10 Thread Gera Shegalov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated SPARK-42752:
--
Description: 
Reproduction steps:
1. download a standard "Hadoop Free" build
2. Start pyspark REPL with Hive support
{code:java}
SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
spark.sql.catalogImplementation=hive
{code}
3. Execute any simple dataframe operation
{code:java}
>>> spark.range(100).show()
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", 
line 416, in range
jdf = self._jsparkSession.range(0, int(start), int(step), 
int(numPartitions))
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
 line 1321, in __call__
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
line 117, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: 
{code}
4. In fact you can just call spark.conf to trigger this issue
{code:java}
>>> spark.conf
Traceback (most recent call last):
  File "", line 1, in 
...
{code}

There are probably two issues here:
1) that Hive support should be gracefully disabled if it the dependency not on 
the classpath as claimed by 
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
2) but at the very least the user should be able to see the exception to 
understand the issue, and take an action

 

  was:
Reproduction steps:
1. download a standard "Hadoop Free" build
2. Start pyspark REPL with Hive support
{code:java}
SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
spark.sql.catalogImplementation=hive
{code}
3. Execute any simple dataframe operation
{code:java}
>>> spark.range(100).show()
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", 
line 416, in range
jdf = self._jsparkSession.range(0, int(start), int(step), 
int(numPartitions))
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
 line 1321, in __call__
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
line 117, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: 
>>> spark.conf
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py", 
line 347, in conf
self._conf = RuntimeConfig(self._jsparkSession.conf())
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
 line 1321, in __call__
  File 
"/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/utils.py", 
line 117, in deco
raise converted from None
pyspark.sql.utils.IllegalArgumentException: 
{code}
4. In fact you can just call spark.conf to trigger this issue
{code:java}
>>> spark.conf
Traceback (most recent call last):
  File "", line 1, in 
...
{code}

There are probably two issues here:
1) that Hive support should be gracefully disabled if it the dependency not on 
the classpath as claimed by 
https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
2) but at the very least the user should be able to see the exception to 
understand the issue, and take an action

 


> Unprintable IllegalArgumentException with Hive catalog enabled in "Hadoop 
> Free" distibution
> ---
>
> Key: SPARK-42752
> URL: https://issues.apache.org/jira/browse/SPARK-42752
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.1.3, 3.2.4, 3.3.3, 3.4.1, 3.5.0
> Environment: local
>Reporter: Gera Shegalov
>Priority: Major
>
> Reproduction steps:
> 1. download a standard "Hadoop Free" build
> 2. Start pyspark REPL with Hive support
> {code:java}
> SPARK_DIST_CLASSPATH=$(~/dist/hadoop-3.4.0-SNAPSHOT/bin/hadoop classpath) 
> ~/dist/spark-3.2.3-bin-without-hadoop/bin/pyspark --conf 
> spark.sql.catalogImplementation=hive
> {code}
> 3. Execute any simple dataframe operation
> {code:java}
> >>> spark.range(100).show()
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/pyspark/sql/session.py",
>  line 416, in range
> jdf = self._jsparkSession.range(0, int(start), int(step), 
> int(numPartitions))
>   File 
> "/home/user/dist/spark-3.2.3-bin-without-hadoop/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py",
>  line 1321, in