[jira] [Created] (SPARK-29920) Parsing failure on interval '20 15' day to hour

2019-11-15 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29920:
--

 Summary: Parsing failure on interval '20 15' day to hour
 Key: SPARK-29920
 URL: https://issues.apache.org/jira/browse/SPARK-29920
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk



{code:sql}
spark-sql> select interval '20 15' day to hour;
Error in query:
requirement failed: Interval string must match day-time format of 'd h:m:s.n': 
20 15(line 1, pos 16)

== SQL ==
select interval '20 15' day to hour
^^^
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29927:
--

 Summary: Parse timestamps in microsecond precision by 
`to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
 Key: SPARK-29927
 URL: https://issues.apache.org/jira/browse/SPARK-29927
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` functions 
uses SimpleDateFormat to parse strings to timestamps. SimpleDateFormat is able 
to parse only in millisecond precision if an user specified `SSS` in a pattern. 
The ticket aims to support parsing up to the microsecond precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`

2019-11-16 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975697#comment-16975697
 ] 

Maxim Gekk commented on SPARK-29927:


[~cloud_fan] WDYT, does it make sense to change the functions as well?

> Parse timestamps in microsecond precision by `to_timestamp`, 
> `to_unix_timestamp`, `unix_timestamp`
> --
>
> Key: SPARK-29927
> URL: https://issues.apache.org/jira/browse/SPARK-29927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` 
> functions uses SimpleDateFormat to parse strings to timestamps. 
> SimpleDateFormat is able to parse only in millisecond precision if an user 
> specified `SSS` in a pattern. The ticket aims to support parsing up to the 
> microsecond precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources

2019-11-16 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29904:
---
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3

> Parse timestamps in microsecond precision by JSON/CSV datasources
> -
>
> Key: SPARK-29904
> URL: https://issues.apache.org/jira/browse/SPARK-29904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.5
>
>
> Currently, Spark can parse strings with timestamps from JSON/CSV in 
> millisecond precision. Internally, timestamps have microsecond precision. The 
> ticket aims to modify parsing logic in Spark 2.4 to support the microsecond 
> precision. Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview 
> is risky, so, need to find another lighter solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29928) Check parsing timestamps up to microsecond precision by JSON/CSV datasource

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29928:
--

 Summary: Check parsing timestamps up to microsecond precision by 
JSON/CSV datasource
 Key: SPARK-29928
 URL: https://issues.apache.org/jira/browse/SPARK-29928
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Port tests added for 2.4 by the commit: 
https://github.com/apache/spark/commit/9c7e8be1dca8285296f3052c41f35043699d7d10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29930:
--

 Summary: Remove SQL configs declared to be removed in Spark 3.0
 Key: SPARK-29930
 URL: https://issues.apache.org/jira/browse/SPARK-29930
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Need to remove the following SQL configs:
* spark.sql.fromJsonForceNullableSchema
* spark.sql.legacy.compareDateTimestampInTimestamp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29931:
--

 Summary: Declare all SQL legacy configs as will be removed in 
Spark 4.0
 Key: SPARK-29931
 URL: https://issues.apache.org/jira/browse/SPARK-29931
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Add the sentence to descriptions of all legacy SQL configs existed before Spark 
3.0: "This config will be removed in Spark 4.0.". Here is the list of such 
configs:
* spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
* spark.sql.legacy.literal.pickMinimumPrecision
* spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
* spark.sql.legacy.sizeOfNull
* spark.sql.legacy.replaceDatabricksSparkAvro.enabled
* spark.sql.legacy.setopsPrecedence.enabled
* spark.sql.legacy.integralDivide.returnBigint
* spark.sql.legacy.bucketedTableScan.outputOrdering
* spark.sql.legacy.parser.havingWithoutGroupByAsWhere
* spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
* spark.sql.legacy.setCommandRejectsSparkCoreConfs
* spark.sql.legacy.utcTimestampFunc.enabled
* spark.sql.legacy.typeCoercion.datetimeToString
* spark.sql.legacy.looseUpcast
* spark.sql.legacy.ctePrecedence.enabled
* spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-16 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975813#comment-16975813
 ] 

Maxim Gekk commented on SPARK-29931:


[~rxin] [~lixiao] [~srowen] [~dongjoon] [~cloud_fan] [~hyukjin.kwon] Does this 
make sense for you?

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975944#comment-16975944
 ] 

Maxim Gekk commented on SPARK-29931:


> It's conceivable there could a reason to do it later, or sooner.

Later is not problem what about sooner. Most of the configs were added for 
Spark 3.0. If you decide to remove one of them in a minor release between 3.0 
and 4.0, you can break user apps that is unacceptable for minor releases, I do 
believe.

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29933:
--

 Summary: ThriftServerQueryTestSuite runs tests with wrong settings
 Key: SPARK-29933
 URL: https://issues.apache.org/jira/browse/SPARK-29933
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
the PostgreSQL dialect. See 
https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29933:
---
Attachment: filter_tests.patch

> ThriftServerQueryTestSuite runs tests with wrong settings
> -
>
> Key: SPARK-29933
> URL: https://issues.apache.org/jira/browse/SPARK-29933
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: filter_tests.patch
>
>
> ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
> keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
> the PostgreSQL dialect. See 
> https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976099#comment-16976099
 ] 

Maxim Gekk commented on SPARK-29758:


I have reproduced the issue on 2.4. The problem is in Jackson core 2.6.7. It 
was fixed by 
https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200#diff-7990edc67621822770cdc62e12d933d4R647-R650
 in the version 2.7.7. We could try to back port this 
https://github.com/apache/spark/pull/21596 on 2.4. [~hyukjin.kwon] WDYT? 

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("jso

[jira] [Commented] (SPARK-29575) from_json can produce nulls for fields which are marked as non-nullable

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976102#comment-16976102
 ] 

Maxim Gekk commented on SPARK-29575:


This is intentional behavior. User's schema is forcibly set as nullable. See 
SPARK-23173  

> from_json can produce nulls for fields which are marked as non-nullable
> ---
>
> Key: SPARK-29575
> URL: https://issues.apache.org/jira/browse/SPARK-29575
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4
>Reporter: Victor Lopez
>Priority: Major
>
> I believe this issue was resolved elsewhere 
> (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this 
> bug seems to still be there.
> The issue appears when using {{from_json}} to parse a column in a Spark 
> dataframe. It seems like {{from_json}} ignores whether the schema provided 
> has any {{nullable:False}} property.
> {code:java}
> schema = T.StructType().add(T.StructField('id', T.LongType(), 
> nullable=False)).add(T.StructField('name', T.StringType(), nullable=False))
> data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': 
> 'jane'})}]
> df = spark.read.json(sc.parallelize(data))
> df.withColumn("details", F.from_json("user", 
> schema)).select("details.*").show()
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106
 ] 

Maxim Gekk commented on SPARK-29758:


Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result_with_prefix
> res64: Int = 27

[jira] [Comment Edited] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106
 ] 

Maxim Gekk edited comment on SPARK-29758 at 11/17/19 6:17 PM:
--

Another solution is to disable this optimization: 
[https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478]


was (Author: maxgekk):
Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {

[jira] [Created] (SPARK-29949) JSON/CSV formats timestamps incorrectly

2019-11-18 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29949:
--

 Summary: JSON/CSV formats timestamps incorrectly
 Key: SPARK-29949
 URL: https://issues.apache.org/jira/browse/SPARK-29949
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


For example:
{code}
scala> val t = java.sql.Timestamp.valueOf("2019-11-18 11:56:00.123456")
t: java.sql.Timestamp = 2019-11-18 11:56:00.123456
scala> Seq(t).toDF("t").select(to_json(struct($"t"), Map("timestampFormat" -> 
"-MM-dd HH:mm:ss.SS"))).show(false)
+-+
|structstojson(named_struct(NamePlaceholder(), t))|
+-+
|{"t":"2019-11-18 11:56:00.000123"}   |
+-+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29963) Check formatting timestamps up to microsecond precision by JSON/CSV datasource

2019-11-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29963:
--

 Summary: Check formatting timestamps up to microsecond precision 
by JSON/CSV datasource
 Key: SPARK-29963
 URL: https://issues.apache.org/jira/browse/SPARK-29963
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Port tests added for 2.4 by the commit: 
https://github.com/apache/spark/commit/47cb1f359af62383e24198dbbaa0b4503348cd04



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30165:
--

 Summary: Eliminate compilation warnings
 Key: SPARK-30165
 URL: https://issues.apache.org/jira/browse/SPARK-30165
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0
Reporter: Maxim Gekk


This is an umbrella ticket for sub-tasks for eliminating compilation warnings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Attachment: spark_warnings.txt

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Description: This is an umbrella ticket for sub-tasks for eliminating 
compilation warnings.  I dumped all warnings to the spark_warnings.txt file 
attached to the ticket.  (was: This is an umbrella ticket for sub-tasks for 
eliminating compilation warnings. )

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30166) Eliminate compilation warnings in JSONOptions

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30166:
--

 Summary: Eliminate compilation warnings in JSONOptions
 Key: SPARK-30166
 URL: https://issues.apache.org/jira/browse/SPARK-30166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Scala 2.12 outputs the following warnings for JSONOptions:

{code}
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java 
enum Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, 
allowNumericLeadingZeros)
Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum 
Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, 
allowNonNumericNumbers)
Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER 
in Java enum Feature is deprecated: see corresponding Javadoc for more 
information.
factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER,
Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java 
enum Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, 
allowUnquotedControlChars)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Component/s: (was: Build)
 SQL

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990925#comment-16990925
 ] 

Maxim Gekk commented on SPARK-30165:


[~aman_omer] Feel free to take a sub-set of warnings and create a sub-task to 
fix them.

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30168) Eliminate warnings in Parquet datasource

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30168:
--

 Summary: Eliminate warnings in Parquet datasource
 Key: SPARK-30168
 URL: https://issues.apache.org/jira/browse/SPARK-30168
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


# 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
{code}
Warning:Warning:line (120)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] = {
Warning:Warning:line (125)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  new org.apache.parquet.hadoop.ParquetInputSplit(
Warning:Warning:line (134)method readFooter in class ParquetFileReader is 
deprecated: see corresponding Javadoc for more information.
  ParquetFileReader.readFooter(conf, filePath, 
SKIP_ROW_GROUPS).getFileMetaData
Warning:Warning:line (183)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  split: ParquetInputSplit,
Warning:Warning:line (212)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  split: ParquetInputSplit,
{code}
# 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
{code}
Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in 
org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit 
in org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit 
in org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (97)java: getRowGroupOffsets() in 
org.apache.parquet.hadoop.ParquetInputSplit has been deprecated
Warning:Warning:line (105)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (108)java: 
filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType)
 in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated
Warning:Warning:line (111)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (147)java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (203)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (226)java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
{code}
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala
# sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30166) Eliminate warnings in JSONOptions

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30166:
---
Summary: Eliminate warnings in JSONOptions  (was: Eliminate compilation 
warnings in JSONOptions)

> Eliminate warnings in JSONOptions
> -
>
> Key: SPARK-30166
> URL: https://issues.apache.org/jira/browse/SPARK-30166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Scala 2.12 outputs the following warnings for JSONOptions:
> {code}
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
> Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, 
> allowNumericLeadingZeros)
> Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, 
> allowNonNumericNumbers)
> Warning:Warning:line (139)Java enum 
> ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: 
> see corresponding Javadoc for more information.
> 
> factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER,
> Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, 
> allowUnquotedControlChars)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30169) Eliminate warnings in Kafka connector

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30169:
--

 Summary: Eliminate warnings in Kafka connector
 Key: SPARK-30169
 URL: https://issues.apache.org/jira/browse/SPARK-30169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Eliminate compilation warnings in the files:
{code}
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30170:
--

 Summary: Eliminate warnings: part 1
 Key: SPARK-30170
 URL: https://issues.apache.org/jira/browse/SPARK-30170
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Eliminate compilation warnings in:
# StopWordsRemoverSuite
{code}
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
{code}
# MLTest.scala
{code}
Warning:Warning:line (88)match may not be exhaustive.
It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
val n = Attribute.fromStructField(dataframe.schema(colName)) match {
{code}
# FloatType.scala
{code}
Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 
2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
{code}
# AnalysisExternalCatalogSuite.scala
{code}
Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
deprecated: see corresponding Javadoc for more information.
  verifyZeroInteractions(catalog)
{code}
# CSVExprUtilsSuite.scala
{code}
Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
instead.
("\0", Some("\u"), None)
{code}
# CollectionExpressionsSuite.scala, ashExpressionsSuite.scala, 
ExpressionParserSuite.scala 
{code}
Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be 
enabled
by making the implicit value scala.language.implicitConversions visible.
This can be achieved by adding the import clause 'import 
scala.language.implicitConversions'
or by setting the compiler option -language:implicitConversions.
See the

[jira] [Commented] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990989#comment-16990989
 ] 

Maxim Gekk commented on SPARK-30170:


I am working on this

> Eliminate warnings: part 1
> --
>
> Key: SPARK-30170
> URL: https://issues.apache.org/jira/browse/SPARK-30170
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Eliminate compilation warnings in:
> # StopWordsRemoverSuite
> {code}
> Warning:Warning:line (245)non-variable type argument String in type pattern 
> Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> {code}
> # MLTest.scala
> {code}
> Warning:Warning:line (88)match may not be exhaustive.
> It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
> val n = Attribute.fromStructField(dataframe.schema(colName)) match {
> {code}
> # FloatType.scala
> {code}
> Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
> BigDecimal(y)).floatValue
> Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
> BigDecimal(y)).floatValue
> Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
> BigDecimal(y)).floatValue
> Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
> BigDecimal(y)).floatValue
> {code}
> # AnalysisExternalCatalogSuite.scala
> {code}
> Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
> deprecated: see corresponding Javadoc for more information.
>   verifyZeroInteractions(catalog)
> {code}
> # CSVExprUtilsSuite.scala
> {code}
> Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
> instead.
> ("\0", Some("\u"), None)
> {c

[jira] [Updated] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30170:
---
Description: 
Eliminate compilation warnings in:
 # StopWordsRemoverSuite
{code:java}
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
{code}

 # MLTest.scala
{code:java}
Warning:Warning:line (88)match may not be exhaustive.
It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
val n = Attribute.fromStructField(dataframe.schema(colName)) match {
{code}

 # FloatType.scala
{code:java}
Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 
2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
{code}

 # AnalysisExternalCatalogSuite.scala
{code:java}
Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
deprecated: see corresponding Javadoc for more information.
  verifyZeroInteractions(catalog)
{code}

 # CSVExprUtilsSuite.scala
{code:java}
Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
instead.
("\0", Some("\u"), None)
{code}

 # CollectionExpressionsSuite.scala, HashExpressionsSuite.scala, 
ExpressionParserSuite.scala
{code:java}
Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be 
enabled
by making the implicit value scala.language.implicitConversions visible.
This can be achieved by adding the import clause 'import 
scala.language.implicitConversions'
or by setting the compiler option -language:implicitConversions.
See the Scaladoc for value scala.language.implicitConversions for a discussion
why the feature should be explicitly enabled.

[jira] [Commented] (SPARK-30165) Eliminate compilation warnings

2019-12-12 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994806#comment-16994806
 ] 

Maxim Gekk commented on SPARK-30165:


> Are you sure on these?

I am almost sure we can fix Parquet and Kafka related warnings. Not sure about 
warnings coming from deprecated Spark API. Maybe it is possible to suppress 
such warnings in tests. In any case, we know in advance that we test deprecated 
API. Such warnings don't guard us from mistakes.

I quickly googled and found this 
[https://github.com/scala/bug/issues/7934#issuecomment-292425679] . Maybe we 
can use the approach in tests.

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30168) Eliminate warnings in Parquet datasource

2019-12-13 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995754#comment-16995754
 ] 

Maxim Gekk commented on SPARK-30168:


[~Ankitraj] Go ahead.

> Eliminate warnings in Parquet datasource
> 
>
> Key: SPARK-30168
> URL: https://issues.apache.org/jira/browse/SPARK-30168
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> # 
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
> {code}
> Warning:Warning:line (120)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] 
> = {
> Warning:Warning:line (125)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   new org.apache.parquet.hadoop.ParquetInputSplit(
> Warning:Warning:line (134)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
>   ParquetFileReader.readFooter(conf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> Warning:Warning:line (183)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   split: ParquetInputSplit,
> Warning:Warning:line (212)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   split: ParquetInputSplit,
> {code}
> # 
> sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
> {code}
> Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in 
> org.apache.parquet.hadoop has been deprecated
> Warning:Warning:line (95)java: 
> org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has 
> been deprecated
> Warning:Warning:line (95)java: 
> org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has 
> been deprecated
> Warning:Warning:line (97)java: getRowGroupOffsets() in 
> org.apache.parquet.hadoop.ParquetInputSplit has been deprecated
> Warning:Warning:line (105)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (108)java: 
> filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType)
>  in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated
> Warning:Warning:line (111)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (147)java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (203)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (226)java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> {code}
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala
> # 
> sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30258) Eliminate warnings of depracted Spark APIs in tests

2019-12-13 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30258:
--

 Summary: Eliminate warnings of depracted Spark APIs in tests
 Key: SPARK-30258
 URL: https://issues.apache.org/jira/browse/SPARK-30258
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Suppress deprecation warnings in tests that check deprecated Spark APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30258) Eliminate warnings of deprecated Spark APIs in tests

2019-12-13 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30258:
---
Summary: Eliminate warnings of deprecated Spark APIs in tests  (was: 
Eliminate warnings of depracted Spark APIs in tests)

> Eliminate warnings of deprecated Spark APIs in tests
> 
>
> Key: SPARK-30258
> URL: https://issues.apache.org/jira/browse/SPARK-30258
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Suppress deprecation warnings in tests that check deprecated Spark APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30309) Mark `Filter` as a `sealed` class

2019-12-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30309:
--

 Summary: Mark `Filter` as a `sealed` class
 Key: SPARK-30309
 URL: https://issues.apache.org/jira/browse/SPARK-30309
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Add the `sealed` keyword to the `Filter` class at the 
`org.apache.spark.sql.sources` package. So, the compiler should output a 
warning if handling of a filter is missed in a datasource:
{code}
Warning:(154, 65) match may not be exhaustive.
It would fail on the following inputs: AlwaysFalse(), AlwaysTrue()
def translate(filter: sources.Filter): Option[Expression] = filter match {
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30323) Support filters pushdown in CSV datasource

2019-12-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30323:
--

 Summary: Support filters pushdown in CSV datasource
 Key: SPARK-30323
 URL: https://issues.apache.org/jira/browse/SPARK-30323
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


- Implement the `SupportsPushDownFilters` interface in `CSVScanBuilder`
- Apply filters in UnivocityParser
- Change API UnivocityParser - return Seq[InternalRow] from `convert()`
- Update CSVBenchmark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30401) Call requireNonStaticConf() only once

2020-01-01 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30401:
--

 Summary: Call requireNonStaticConf() only once
 Key: SPARK-30401
 URL: https://issues.apache.org/jira/browse/SPARK-30401
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the 
same input:
1. Inside of set(, true)
2. set() converts the second argument to a string and calls set(, 
"true") where requireNonStaticConf() is invoked one more time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30401) Call requireNonStaticConf() only once

2020-01-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006392#comment-17006392
 ] 

Maxim Gekk commented on SPARK-30401:


I am working on it

> Call requireNonStaticConf() only once
> -
>
> Key: SPARK-30401
> URL: https://issues.apache.org/jira/browse/SPARK-30401
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Trivial
>
> The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the 
> same input:
> 1. Inside of set(, true)
> 2. set() converts the second argument to a string and calls set(, 
> "true") where requireNonStaticConf() is invoked one more time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30409) Use `NoOp` datasource in SQL benchmarks

2020-01-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30409:
--

 Summary: Use `NoOp` datasource in SQL benchmarks
 Key: SPARK-30409
 URL: https://issues.apache.org/jira/browse/SPARK-30409
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, SQL benchmarks use `count()`, `collect()` and `foreach(_ => ())` 
actions. The actions have additional overhead. For example, `collect()` 
converts column values to external type values and pull data on the driver. 
Need to unify benchmark and the `NoOp` datasource except the benchmarks for 
`count()` or `collect()`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30171) Eliminate warnings: part2

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006949#comment-17006949
 ] 

Maxim Gekk commented on SPARK-30171:


[~srowen] SPARK-30258 fixes warnings AvroFunctionsSuite.scala but not in 
parsedOptions.ignoreExtension . I am not sure how we can avoid warnings related 
to ignoreExtension.

> Eliminate warnings: part2
> -
>
> Key: SPARK-30171
> URL: https://issues.apache.org/jira/browse/SPARK-30171
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> AvroFunctionsSuite.scala
> Warning:Warning:line (41)method to_avro in package avro is deprecated (since 
> 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' instead.
> val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b"))
> Warning:Warning:line (41)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b"))
> Warning:Warning:line (54)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, 
> avroTypeStr)), df)
> Warning:Warning:line (54)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, 
> avroTypeStr)), df)
> Warning:Warning:line (59)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroStructDF = df.select(to_avro('struct).as("avro"))
> Warning:Warning:line (70)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df)
> Warning:Warning:line (76)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroStructDF = df.select(to_avro('struct).as("avro"))
> Warning:Warning:line (118)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val readBackOne = dfOne.select(to_avro($"array").as("avro"))
> Warning:Warning:line (119)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
>   .select(from_avro($"avro", avroTypeArrStruct).as("array"))
> AvroPartitionReaderFactory.scala
> Warning:Warning:line (64)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
> if (parsedOptions.ignoreExtension || 
> partitionedFile.filePath.endsWith(".avro")) {
> AvroFileFormat.scala
> Warning:Warning:line (98)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
>   if (parsedOptions.ignoreExtension || file.filePath.endsWith(".avro")) {
> AvroUtils.scala
> Warning:Warning:line (55)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
> inferAvroSchemaFromFiles(files, conf, parsedOptions.ignoreExtension,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30172) Eliminate warnings: part3

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006952#comment-17006952
 ] 

Maxim Gekk commented on SPARK-30172:


[~Ankitraj] Are you still working on this?

> Eliminate warnings: part3
> -
>
> Key: SPARK-30172
> URL: https://issues.apache.org/jira/browse/SPARK-30172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> /sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
> Warning:Warning:line (422)method initialize in class AbstractSerDe is 
> deprecated: see corresponding Javadoc for more information.
> serde.initialize(null, properties)
> /sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
> Warning:Warning:line (216)method initialize in class GenericUDTF is 
> deprecated: see corresponding Javadoc for more information.
>   protected lazy val outputInspector = 
> function.initialize(inputInspectors.toArray)
> Warning:Warning:line (342)class UDAF in package exec is deprecated: see 
> corresponding Javadoc for more information.
>   new GenericUDAFBridge(funcWrapper.createFunction[UDAF]())
> Warning:Warning:line (503)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> def serialize(buffer: AggregationBuffer): Array[Byte] = {
> Warning:Warning:line (523)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> def deserialize(bytes: Array[Byte]): AggregationBuffer = {
> Warning:Warning:line (538)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean)
> Warning:Warning:line (538)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean)
> /sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
> Warning:Warning:line (44)java: getTypes() in org.apache.orc.Reader has 
> been deprecated
> Warning:Warning:line (47)java: getTypes() in org.apache.orc.Reader has 
> been deprecated
> /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
> Warning:Warning:line (2,368)method readFooter in class ParquetFileReader 
> is deprecated: see corresponding Javadoc for more information.
> val footer = ParquetFileReader.readFooter(
> /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala
> Warning:Warning:line (202)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def getNewAggregationBuffer: AggregationBuffer = new 
> MockUDAFBuffer(0L, 0L)
> Warning:Warning:line (204)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def reset(agg: AggregationBuffer): Unit = {
> Warning:Warning:line (212)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def iterate(agg: AggregationBuffer, parameters: Array[AnyRef]): 
> Unit = {
> Warning:Warning:line (221)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def merge(agg: AggregationBuffer, partial: Object): Unit = {
> Warning:Warning:line (231)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def terminatePartial(agg: AggregationBuffer): AnyRef = {
> Warning:Warning:line (236)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def terminate(agg: AggregationBuffer): AnyRef = 
> terminatePartial(agg)
> Warning:Warning:line (257)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def getNewAggregationBuffer: AggregationBuffer = {
> Warning:Warning:line (266)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def reset(agg: AggregationBuffer): Unit = {
> Warning:Warning:line (277)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def iterate(agg: AggregationBuffer, parameters: Arr

[jira] [Commented] (SPARK-30174) Eliminate warnings :part 4

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006953#comment-17006953
 ] 

Maxim Gekk commented on SPARK-30174:


[~shivuson...@gmail.com] Are you still working on this? If so, could you write 
in the ticket how are going to fix the warnings, please.

> Eliminate warnings :part 4
> --
>
> Key: SPARK-30174
> URL: https://issues.apache.org/jira/browse/SPARK-30174
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
> {code:java}
> Warning:Warning:line (127)value ENABLE_JOB_SUMMARY in class 
> ParquetOutputFormat is deprecated: see corresponding Javadoc for more 
> information.
>   && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) {
> Warning:Warning:line (261)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
> new org.apache.parquet.hadoop.ParquetInputSplit(
> Warning:Warning:line (272)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
> ParquetFileReader.readFooter(sharedConf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> Warning:Warning:line (442)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
>   ParquetFileReader.readFooter(
> {code}
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
> {code:java}
>  Warning:Warning:line (91)value ENABLE_JOB_SUMMARY in class 
> ParquetOutputFormat is deprecated: see corresponding Javadoc for more 
> information.
>   && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) {
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30412) Eliminate warnings in Java tests regarding to deprecated API

2020-01-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30412:
--

 Summary: Eliminate warnings in Java tests regarding to deprecated 
API
 Key: SPARK-30412
 URL: https://issues.apache.org/jira/browse/SPARK-30412
 Project: Spark
  Issue Type: Sub-task
  Components: Java API, SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Suppress warnings about deprecated Spark API in Java test suites:
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java
Warning:Warning:line (32)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (91)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (100)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (109)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (118)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
{code}
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/Java8DatasetAggregatorSuite.java
Warning:Warning:line (28)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (37)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (46)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (55)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (64)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
{code}
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java
Warning:Warning:line (478)java: 
json(org.apache.spark.api.java.JavaRDD) in 
org.apache.spark.sql.DataFrameReader has been deprecated
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33381) Unify DSv1 and DSv2 command tests

2020-11-07 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33381:
---
Summary: Unify DSv1 and DSv2 command tests  (was: Unify dsv1 and dsv2 
command tests)

> Unify DSv1 and DSv2 command tests
> -
>
> Key: SPARK-33381
> URL: https://issues.apache.org/jira/browse/SPARK-33381
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW 
> TABLES and etc. Put datasource specific tests to separate test suites. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33381) Unify dsv1 and dsv2 command tests

2020-11-07 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33381:
--

 Summary: Unify dsv1 and dsv2 command tests
 Key: SPARK-33381
 URL: https://issues.apache.org/jira/browse/SPARK-33381
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW 
TABLES and etc. Put datasource specific tests to separate test suites. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33382) Unify v1 and v2 SHOW TABLES tests

2020-11-07 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33382:
--

 Summary: Unify v1 and v2 SHOW TABLES tests
 Key: SPARK-33382
 URL: https://issues.apache.org/jira/browse/SPARK-33382
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Gather common tests for DSv1 and DSv2 SHOW TABLES command to a common test. Mix 
this trait to datasource specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33392) Align DSv2 commands to DSv1 implementation

2020-11-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33392:
--

 Summary: Align DSv2 commands to DSv1 implementation
 Key: SPARK-33392
 URL: https://issues.apache.org/jira/browse/SPARK-33392
 Project: Spark
  Issue Type: Umbrella
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


The purpose of this umbrella ticket is:
# Implement missing features of datasource v1 commands in DSv2
# Align behavior of DSv2 commands to the current implementation of DSv1 
commands as much as possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33305) DSv2: DROP TABLE command should also invalidate cache

2020-11-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33305:
---
Parent: SPARK-33392
Issue Type: Sub-task  (was: Bug)

> DSv2: DROP TABLE command should also invalidate cache
> -
>
> Key: SPARK-33305
> URL: https://issues.apache.org/jira/browse/SPARK-33305
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Priority: Major
>
> Different from DSv1, {{DROP TABLE}} command in DSv2 currently only drops the 
> table but doesn't invalidate all caches referencing the table. We should make 
> the behavior consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33364) Expose purge option in TableCatalog.dropTable

2020-11-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33364:
---
Parent: SPARK-33392
Issue Type: Sub-task  (was: New Feature)

> Expose purge option in TableCatalog.dropTable
> -
>
> Key: SPARK-33364
> URL: https://issues.apache.org/jira/browse/SPARK-33364
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Minor
> Fix For: 3.1.0
>
>
> TableCatalog.dropTable currently does not support the purge option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2

2020-11-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33393:
--

 Summary: Support SHOW TABLE EXTENDED in DSv2
 Key: SPARK-33393
 URL: https://issues.apache.org/jira/browse/SPARK-33393
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode in:
https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33
which is supported in DSv1:
https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870

Need to add the same functionality to ShowTablesExec.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33394) Throw `NoSuchDatabaseException` for not existing namespace in DSv2 SHOW TABLES

2020-11-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33394:
--

 Summary: Throw `NoSuchDatabaseException` for not existing 
namespace in DSv2 SHOW TABLES
 Key: SPARK-33394
 URL: https://issues.apache.org/jira/browse/SPARK-33394
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Current implementation of DSv2 SHOW TABLES return an empty result for not 
existing database/namespace. This implementation should be aligned to DSv1 
which throws the `NoSuchDatabaseException` exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33403) DSv2 SHOW TABLES doesn't show `default`

2020-11-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33403:
--

 Summary: DSv2 SHOW TABLES doesn't show `default`
 Key: SPARK-33403
 URL: https://issues.apache.org/jira/browse/SPARK-33403
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


DSv1:
{code:scala}
  test("namespace is not specified and the default catalog is set") {
withSQLConf(SQLConf.DEFAULT_CATALOG.key -> catalog) {
  withTable("table") {
spark.sql(s"CREATE TABLE table (id bigint, data string) $defaultUsing")
sql("SHOW TABLES").show()
  }
}
  }
{code}
{code}
++-+---+
|database|tableName|isTemporary|
++-+---+
| default|table|  false|
++-+---+
{code}

DSv2:
{code}
+-+-+
|namespace|tableName|
+-+-+
| |table|
+-+-+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33426) Unify Hive SHOW TABLES tests

2020-11-11 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33426:
--

 Summary: Unify Hive SHOW TABLES tests
 Key: SPARK-33426
 URL: https://issues.apache.org/jira/browse/SPARK-33426
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


1. Move Hive SHOW TABLES tests to a separate test suite
2. Extend the common SHOW TABLES trait by the new test suite



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog

2020-11-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33430:
--

 Summary: Support namespaces in JDBC v2 Table Catalog
 Key: SPARK-33430
 URL: https://issues.apache.org/jira/browse/SPARK-33430
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


When I extend JDBCTableCatalogSuite by 
org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance:
{code:scala}
import org.apache.spark.sql.execution.command.v2.ShowTablesSuite

class JDBCTableCatalogSuite extends ShowTablesSuite {
  override def version: String = "JDBC V2"
  override def catalog: String = "h2"
...
{code}
some tests from JDBCTableCatalogSuite fail with:
{code}
[info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 seconds, 
502 milliseconds)
[info]   org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does 
not support namespaces;
[info]   at 
org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83)
[info]   at 
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208)
[info]   at 
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog

2020-11-12 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230514#comment-17230514
 ] 

Maxim Gekk commented on SPARK-33430:


[~cloud_fan] [~huaxingao] It would be nice to support namespaces, WDYT?

> Support namespaces in JDBC v2 Table Catalog
> ---
>
> Key: SPARK-33430
> URL: https://issues.apache.org/jira/browse/SPARK-33430
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> When I extend JDBCTableCatalogSuite by 
> org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance:
> {code:scala}
> import org.apache.spark.sql.execution.command.v2.ShowTablesSuite
> class JDBCTableCatalogSuite extends ShowTablesSuite {
>   override def version: String = "JDBC V2"
>   override def catalog: String = "h2"
> ...
> {code}
> some tests from JDBCTableCatalogSuite fail with:
> {code}
> [info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 
> seconds, 502 milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does 
> not support namespaces;
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2

2020-11-14 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232094#comment-17232094
 ] 

Maxim Gekk commented on SPARK-33393:


I plan to work on this soon.

> Support SHOW TABLE EXTENDED in DSv2
> ---
>
> Key: SPARK-33393
> URL: https://issues.apache.org/jira/browse/SPARK-33393
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode 
> in:
> https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33
> which is supported in DSv1:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870
> Need to add the same functionality to ShowTablesExec.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-14 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232095#comment-17232095
 ] 

Maxim Gekk commented on SPARK-33452:


I plan to work on this soon.

> Create a V2 SHOW PARTITIONS execution node
> --
>
> Key: SPARK-33452
> URL: https://issues.apache.org/jira/browse/SPARK-33452
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> There is the V1 SHOW PARTITIONS implementation:
> https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
> The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node

2020-11-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33452:
--

 Summary: Create a V2 SHOW PARTITIONS execution node
 Key: SPARK-33452
 URL: https://issues.apache.org/jira/browse/SPARK-33452
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


There is the V1 SHOW PARTITIONS implementation:
https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975
The ticket aims to add V2 implementation with similar behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests

2020-11-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33453:
--

 Summary: Unify v1 and v2 SHOW PARTITIONS tests
 Key: SPARK-33453
 URL: https://issues.apache.org/jira/browse/SPARK-33453
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common test. 
Mix this trait to datasource specific test suites.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33505:
--

 Summary: Fix insert into `InMemoryPartitionTable`
 Key: SPARK-33505
 URL: https://issues.apache.org/jira/browse/SPARK-33505
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
create partitions. The example below demonstrates the issue:

{code:scala}
  test("insert into partitioned table") {
val t = "testpart.ns1.ns2.tbl"
withTable(t) {
  spark.sql(
s"""
   |CREATE TABLE $t (id bigint, name string, data string)
   |USING foo
   |PARTITIONED BY (id, name)""".stripMargin)
  spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 'abc'")

  val partTable = catalog("testpart").asTableCatalog
.loadTable(Identifier.of(Array("ns1", "ns2"), 
"tbl")).asInstanceOf[InMemoryPartitionTable]
  assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
UTF8String.fromString("Max")
}
  }
{code}

The partitionExists() function return false for the partitions that must be 
created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33509) List partition by names from V2 tables that support partition management

2020-11-21 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33509:
--

 Summary: List partition by names from V2 tables that support 
partition management
 Key: SPARK-33509
 URL: https://issues.apache.org/jira/browse/SPARK-33509
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Currently, the SupportsPartitionManagement interface exposes only the 
listPartitionIdentifiers() method which does not allow to list partition by 
names. So, it is hard to implement:
{code:java}
SHOW PARTITIONS table PARTITION(month=2)
{code}
from the table like:
{code:java}
CREATE TABLE $table (price int, qty int, year int, month int)
USING parquet
partitioned by (year, month)
{code}
because listPartitionIdentifiers() requires to specify value for *year* .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33511) Respect case sensitivity in resolving partition specs V2

2020-11-21 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33511:
--

 Summary: Respect case sensitivity in resolving partition specs V2
 Key: SPARK-33511
 URL: https://issues.apache.org/jira/browse/SPARK-33511
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


DSv1 DDL commands respect the SQL config spark.sql.caseSensitive, for example
{code:java}
spark-sql> CREATE TABLE tbl1 (id bigint, data string) USING parquet PARTITIONED 
BY (id);
spark-sql> ALTER TABLE tbl1 ADD PARTITION (ID=1);
spark-sql> SHOW PARTITIONS tbl1;
id=1
{code}
but the same ALTER TABLE command fails on DSv2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33521) Universal type conversion of V2 partition values

2020-11-23 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33521:
--

 Summary: Universal type conversion of V2 partition values
 Key: SPARK-33521
 URL: https://issues.apache.org/jira/browse/SPARK-33521
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Support other types while resolving partition specs in

https://github.com/apache/spark/blob/23e9920b3910e4f05269853429c7f1cdc7b5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala#L72



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33529) Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__

2020-11-24 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33529:
--

 Summary: Resolver of V2 partition specs doesn't handle 
__HIVE_DEFAULT_PARTITION__
 Key: SPARK-33529
 URL: https://issues.apache.org/jira/browse/SPARK-33529
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - 
the same as DSv1 does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33529) Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__

2020-11-24 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33529:
---
Description: 
The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - 
the same as DSv1 does.

For example in DSv1:

{code:java}
spark-sql> CREATE TABLE tbl11 (id int, part0 string) USING parquet PARTITIONED 
BY (part0);
spark-sql> ALTER TABLE tbl11 ADD PARTITION (part0 = 
'__HIVE_DEFAULT_PARTITION__');
spark-sql> INSERT INTO tbl11 PARTITION (part0='__HIVE_DEFAULT_PARTITION__') 
SELECT 1;
spark-sql> SELECT * FROM tbl11;
1   NULL
{code}
 

 

  was:The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as 
null - the same as DSv1 does.


> Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__
> 
>
> Key: SPARK-33529
> URL: https://issues.apache.org/jira/browse/SPARK-33529
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - 
> the same as DSv1 does.
> For example in DSv1:
> {code:java}
> spark-sql> CREATE TABLE tbl11 (id int, part0 string) USING parquet 
> PARTITIONED BY (part0);
> spark-sql> ALTER TABLE tbl11 ADD PARTITION (part0 = 
> '__HIVE_DEFAULT_PARTITION__');
> spark-sql> INSERT INTO tbl11 PARTITION (part0='__HIVE_DEFAULT_PARTITION__') 
> SELECT 1;
> spark-sql> SELECT * FROM tbl11;
> 1 NULL
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests

2020-11-25 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33558:
--

 Summary: Unify v1 and v2 ALTER TABLE .. PARTITION tests
 Key: SPARK-33558
 URL: https://issues.apache.org/jira/browse/SPARK-33558
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
and v2 datasources. Some tests can be places to V1 and V2 specific test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33569) Remove getting partitions by only ident

2020-11-26 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33569:
--

 Summary: Remove getting partitions by only ident
 Key: SPARK-33569
 URL: https://issues.apache.org/jira/browse/SPARK-33569
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


This is a follow up of SPARK-33509 which added a function for getting 
partitions by names and ident. The function which gets partitions by ident is 
not used anymore, and it can be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column

2020-11-28 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33585:
--

 Summary: The comment for SQLContext.tables() doesn't mention the 
`database` column
 Key: SPARK-33585
 URL: https://issues.apache.org/jira/browse/SPARK-33585
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.0.1, 2.4.7, 3.1.0
Reporter: Maxim Gekk


The comment says: "The returned DataFrame has two columns, tableName and 
isTemporary":
https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664

but actually the dataframe has 3 columns:
{code:scala}
scala> spark.range(10).createOrReplaceTempView("view1")
scala> val tables = spark.sqlContext.tables()
tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string 
... 1 more field]

scala> tables.printSchema
root
 |-- database: string (nullable = false)
 |-- tableName: string (nullable = false)
 |-- isTemporary: boolean (nullable = false)


scala> tables.show
++-+---+
|database|tableName|isTemporary|
++-+---+
| default|   t1|  false|
| default|   t2|  false|
| default|  ymd|  false|
||view1|   true|
++-+---+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33588) Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive`

2020-11-28 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33588:
--

 Summary: Partition spec in SHOW TABLE EXTENDED doesn't respect 
`spark.sql.caseSensitive`
 Key: SPARK-33588
 URL: https://issues.apache.org/jira/browse/SPARK-33588
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.8, 3.0.2, 3.1.0
Reporter: Maxim Gekk


For example:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > partitioned by (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1);
Error in query: Partition spec is invalid. The spec (YEAR, Month) must match 
the partition spec (year, month) defined in table '`default`.`tbl1`';
{code}
The spark.sql.caseSensitive flag is false by default, so, the partition spec is 
valid.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column

2020-11-28 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33585:
---
Affects Version/s: (was: 2.4.7)
   (was: 3.0.1)
   3.0.2
   2.4.8

> The comment for SQLContext.tables() doesn't mention the `database` column
> -
>
> Key: SPARK-33585
> URL: https://issues.apache.org/jira/browse/SPARK-33585
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The comment says: "The returned DataFrame has two columns, tableName and 
> isTemporary":
> https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664
> but actually the dataframe has 3 columns:
> {code:scala}
> scala> spark.range(10).createOrReplaceTempView("view1")
> scala> val tables = spark.sqlContext.tables()
> tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string 
> ... 1 more field]
> scala> tables.printSchema
> root
>  |-- database: string (nullable = false)
>  |-- tableName: string (nullable = false)
>  |-- isTemporary: boolean (nullable = false)
> scala> tables.show
> ++-+---+
> |database|tableName|isTemporary|
> ++-+---+
> | default|   t1|  false|
> | default|   t2|  false|
> | default|  ymd|  false|
> ||view1|   true|
> ++-+---+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33591:
--

 Summary: NULL is recognized as the "null" string in partition specs
 Key: SPARK-33591
 URL: https://issues.apache.org/jira/browse/SPARK-33591
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


For example:
{code:sql}
spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY 
(p1);
spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
spark-sql> SELECT isnull(p1) FROM tbl5;
false
{code}

The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33591:
---
Issue Type: Improvement  (was: Bug)

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs

2020-11-29 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33591:
---
Issue Type: Bug  (was: Improvement)

> NULL is recognized as the "null" string in partition specs
> --
>
> Key: SPARK-33591
> URL: https://issues.apache.org/jira/browse/SPARK-33591
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code:sql}
> spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED 
> BY (p1);
> spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0;
> spark-sql> SELECT isnull(p1) FROM tbl5;
> false
> {code}
> The *p1 = null* is not recognized as a partition with NULL value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241339#comment-17241339
 ] 

Maxim Gekk commented on SPARK-33571:


[~simonvanderveldt] Thank you for the detailed description and your 
investigation. Let me clarify a few things:

> From our testing we're seeing several issues:
> Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that 
> contains fields of type `TimestampType` which contain timestamps before the 
> above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5

Spark 2.4.5 writes timestamps as parquet INT96 type. The SQL config 
`datetimeRebaseModeInRead` does not influence on reading such types in Spark 
3.0.1, so, Spark performs rebasing always (LEGACY mode). We recently added 
separate configs for INT96:
* https://github.com/apache/spark/pull/30056
* https://github.com/apache/spark/pull/30121

The changes will be released with Spark 3.1.0.

> Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that 
> contains fields of type `TimestampType` or `DateType` which contain dates or 
> timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.

For INT96, it seems it is correct behavior. We should observe different results 
for TIMESTAMP_MICROS and TIMESTAMP_MILLIS types, see the SQL config 
spark.sql.parquet.outputTimestampType.

The DATE case is more interesting as we must see a difference in results for 
ancient dates. I will investigate this case. 

 

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the

[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241379#comment-17241379
 ] 

Maxim Gekk commented on SPARK-33571:


I have tried to reproduce the issue on the master branch by reading the file 
saved by Spark 2.4.5 
(https://github.com/apache/spark/tree/master/sql/core/src/test/resources/test-data):
{code:scala}
  test("SPARK-33571: read ancient dates saved by Spark 2.4.5") {
withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_READ.key -> 
LEGACY.toString) {
  val path = 
getResourceParquetFilePath("test-data/before_1582_date_v2_4_5.snappy.parquet")
  val df = spark.read.parquet(path)
  df.show(false)
}
withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_READ.key -> 
CORRECTED.toString) {
  val path = 
getResourceParquetFilePath("test-data/before_1582_date_v2_4_5.snappy.parquet")
  val df = spark.read.parquet(path)
  df.show(false)
}
  }
{code}

The results are different in LEGACY and in CORRECTED modes:
{code}
+--+--+
|dict  |plain |
+--+--+
|1001-01-01|1001-01-01|
|1001-01-01|1001-01-02|
|1001-01-01|1001-01-03|
|1001-01-01|1001-01-04|
|1001-01-01|1001-01-05|
|1001-01-01|1001-01-06|
|1001-01-01|1001-01-07|
|1001-01-01|1001-01-08|
+--+--+

+--+--+
|dict  |plain |
+--+--+
|1001-01-07|1001-01-07|
|1001-01-07|1001-01-08|
|1001-01-07|1001-01-09|
|1001-01-07|1001-01-10|
|1001-01-07|1001-01-11|
|1001-01-07|1001-01-12|
|1001-01-07|1001-01-13|
|1001-01-07|1001-01-14|
+--+--+
{code}

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've ma

[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241400#comment-17241400
 ] 

Maxim Gekk commented on SPARK-33571:


Spark 3.0.1 shows different results as well:
{code:scala}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
  /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_275)
scala> 
spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false)
20/12/01 12:31:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.SparkUpgradeException: You may get a different result due to 
the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps 
before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files 
may be written by Spark 2.x or legacy versions of Hive, which uses a legacy 
hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian 
calendar. See more details in SPARK-31404. You can set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the 
datetime values w.r.t. the calendar difference during reading. Or set 
spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the 
datetime values as it is.

scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", 
"LEGACY")

scala> 
spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false)
+--+--+
|dict  |plain |
+--+--+
|1001-01-01|1001-01-01|
|1001-01-01|1001-01-02|
|1001-01-01|1001-01-03|
|1001-01-01|1001-01-04|
|1001-01-01|1001-01-05|
|1001-01-01|1001-01-06|
|1001-01-01|1001-01-07|
|1001-01-01|1001-01-08|
+--+--+


scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", 
"CORRECTED")

scala> 
spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false)
+--+--+
|dict  |plain |
+--+--+
|1001-01-07|1001-01-07|
|1001-01-07|1001-01-08|
|1001-01-07|1001-01-09|
|1001-01-07|1001-01-10|
|1001-01-07|1001-01-11|
|1001-01-07|1001-01-12|
|1001-01-07|1001-01-13|
|1001-01-07|1001-01-14|
+--+--+

{code}

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog po

[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241408#comment-17241408
 ] 

Maxim Gekk commented on SPARK-33571:


[~simonvanderveldt] Looking at the dates, you tested, both dates 1880-10-01 and 
2020-10-01 belong to the Gregorian calendar, so, should be no diffs.

For the date 0220-10-01, please, have a look at the table which I built in the 
PR: https://github.com/apache/spark/pull/28067 . The table shows that there is 
no diffs between 2 calendars for the year.

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33650:
--

 Summary: Misleading error from ALTER TABLE .. PARTITION for 
non-supported partition management table
 Key: SPARK-33650
 URL: https://issues.apache.org/jira/browse/SPARK-33650
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


For a V2 table that doesn't support partition management, ALTER TABLE .. 
ADD/DROP PARTITION throws misleading exception:
{code:java}
PartitionSpecs are not resolved;;
'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
+- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, 
ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859

org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
+- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, 
ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
{code}

The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243441#comment-17243441
 ] 

Maxim Gekk commented on SPARK-33571:


I opened the PR [https://github.com/apache/spark/pull/30596] with some 
improvements for config docs. [~hyukjin.kwon] [~cloud_fan] could you review it, 
please.

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33667) Respect case sensitivity in V1 SHOW PARTITIONS

2020-12-04 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33667:
--

 Summary: Respect case sensitivity in V1 SHOW PARTITIONS
 Key: SPARK-33667
 URL: https://issues.apache.org/jira/browse/SPARK-33667
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.8, 3.0.2, 3.1.0
Reporter: Maxim Gekk


SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is true by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW 
PARTITIONS;
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33667) Respect case sensitivity in V1 SHOW PARTITIONS

2020-12-04 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33667:
---
Description: 
SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is false by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW 
PARTITIONS;
{code}
 

  was:
SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
*spark.sql.caseSensitive* which is true by default, for instance:
{code:sql}
spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
 > USING parquet
 > PARTITIONED BY (year, month);
spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW 
PARTITIONS;
{code}
 


> Respect case sensitivity in V1 SHOW PARTITIONS
> --
>
> Key: SPARK-33667
> URL: https://issues.apache.org/jira/browse/SPARK-33667
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config 
> *spark.sql.caseSensitive* which is false by default, for instance:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int)
>  > USING parquet
>  > PARTITIONED BY (year, month);
> spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1;
> spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1);
> Error in query: Non-partitioning column(s) [YEAR, Month] are specified for 
> SHOW PARTITIONS;
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33670) Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED

2020-12-05 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33670:
--

 Summary: Verify the partition provider is Hive in v1 SHOW TABLE 
EXTENDED
 Key: SPARK-33670
 URL: https://issues.apache.org/jira/browse/SPARK-33670
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.2, 3.1.0
Reporter: Maxim Gekk


Invoke the check verifyPartitionProviderIsHive() from v1 implementation of SHOW 
TABLE EXTENDED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33671) Remove VIEW checks from V1 table commands

2020-12-05 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33671:
--

 Summary: Remove VIEW checks from V1 table commands
 Key: SPARK-33671
 URL: https://issues.apache.org/jira/browse/SPARK-33671
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.2, 3.1.0
Reporter: Maxim Gekk


Checking of VIEWs is performed earlier, see 
https://github.com/apache/spark/pull/30461 . So, the checks can be removed from 
some V1 commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33672) Check SQLContext.tables() for V2 session catalog

2020-12-05 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33672:
--

 Summary: Check SQLContext.tables() for V2 session catalog
 Key: SPARK-33672
 URL: https://issues.apache.org/jira/browse/SPARK-33672
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


V1 ShowTablesCommand is hard coded in SQLContext:
https://github.com/apache/spark/blob/a088a801ed8c17171545c196a3f26ce415de0cd1/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L671
The ticket aims to checks tables() behavior for V2 session catalog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33672) Check SQLContext.tables() for V2 session catalog

2020-12-05 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244542#comment-17244542
 ] 

Maxim Gekk commented on SPARK-33672:


[~cloud_fan] FYI

> Check SQLContext.tables() for V2 session catalog
> 
>
> Key: SPARK-33672
> URL: https://issues.apache.org/jira/browse/SPARK-33672
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> V1 ShowTablesCommand is hard coded in SQLContext:
> https://github.com/apache/spark/blob/a088a801ed8c17171545c196a3f26ce415de0cd1/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L671
> The ticket aims to checks tables() behavior for V2 session catalog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33676) Require exact matched partition spec to schema in ADD/DROP PARTITION

2020-12-06 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33676:
--

 Summary: Require exact matched partition spec to schema in 
ADD/DROP PARTITION
 Key: SPARK-33676
 URL: https://issues.apache.org/jira/browse/SPARK-33676
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


The V1 implementation of ALTER TABLE .. ADD/DROP PARTITION fails when the 
partition spec doesn't exactly match to the partition schema:
{code:sql}
ALTER TABLE tab1 ADD PARTITION (A='9')
Partition spec is invalid. The spec (a) must match the partition spec (a, b) 
defined in table '`dbx`.`tab1`';
org.apache.spark.sql.AnalysisException: Partition spec is invalid. The spec (a) 
must match the partition spec (a, b) defined in table '`dbx`.`tab1`';
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$requireExactMatchedPartitionSpec$1(SessionCatalog.scala:1173)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$requireExactMatchedPartitionSpec$1$adapted(SessionCatalog.scala:1171)
at scala.collection.immutable.List.foreach(List.scala:392)
{code}
for a table partitioned by "a", "b" but the V2 implementation add the wrong 
partition silently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33688) Migrate SHOW TABLE EXTENDED to new resolution framework

2020-12-07 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33688:
--

 Summary: Migrate SHOW TABLE EXTENDED to new resolution framework
 Key: SPARK-33688
 URL: https://issues.apache.org/jira/browse/SPARK-33688
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


# Create the Command logical node for SHOW TABLE EXTENDED
# Remove ShowTableStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33688) Migrate SHOW TABLE EXTENDED to new resolution framework

2020-12-07 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245143#comment-17245143
 ] 

Maxim Gekk commented on SPARK-33688:


I am working on this.

> Migrate SHOW TABLE EXTENDED to new resolution framework
> ---
>
> Key: SPARK-33688
> URL: https://issues.apache.org/jira/browse/SPARK-33688
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> # Create the Command logical node for SHOW TABLE EXTENDED
> # Remove ShowTableStatement



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33670) Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED

2020-12-07 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245174#comment-17245174
 ] 

Maxim Gekk commented on SPARK-33670:


[~hyukjin.kwon] Just in case, which "Affects Version" should be pointed out - 
already released or current unreleased version. For example, I pointed out 
3.0.2 but it has been not released yet. Maybe, I should set 3.0.1?

> Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED
> ---
>
> Key: SPARK-33670
> URL: https://issues.apache.org/jira/browse/SPARK-33670
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.2, 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> Invoke the check verifyPartitionProviderIsHive() from v1 implementation of 
> SHOW TABLE EXTENDED.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33441) Add "unused-import" compile arg to scalac and remove all unused imports in Scala code

2020-12-07 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245723#comment-17245723
 ] 

Maxim Gekk commented on SPARK-33441:


Would it be possible to check unused imports in Java code?

> Add "unused-import" compile arg to scalac and remove all unused imports in 
> Scala code 
> --
>
> Key: SPARK-33441
> URL: https://issues.apache.org/jira/browse/SPARK-33441
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.1.0
>
>
> * Add new scala compile arg to defense against new unused imports:
>  ** "-Ywarn-unused-import" for Scala 2.12
>  ** "-Wconf:cat=unused-imports:ws" or "-Wconf:cat=unused-imports:error" for 
> Scala 2.13
>  * Remove all unused imports in Scala code 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33706) Require fully specified partition identifier in partitionExists()

2020-12-07 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33706:
--

 Summary: Require fully specified partition identifier in 
partitionExists()
 Key: SPARK-33706
 URL: https://issues.apache.org/jira/browse/SPARK-33706
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Currently, partitionExists() from SupportsPartitionManagement accept any 
partition identifier even which is not fully specified. This ticket aim to add 
a check for the length of partition schema and partition identifier, and 
require exact matching. So, we should prohibit not fully specified IDs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests

2020-12-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33558:
---
Summary: Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests  (was: Unify v1 
and v2 ALTER TABLE .. PARTITION tests)

> Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 
> and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests

2020-12-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33558:
---
Description: Extract ALTER TABLE .. ADD PARTITION tests to the common place 
to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 
specific test suites.  (was: Extract ALTER TABLE .. PARTITION tests to the 
common place to run them for V1 and v2 datasources. Some tests can be places to 
V1 and V2 specific test suites.)

> Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
> --
>
> Key: SPARK-33558
> URL: https://issues.apache.org/jira/browse/SPARK-33558
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-09 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246717#comment-17246717
 ] 

Maxim Gekk commented on SPARK-33571:


> The behavior of the to be introduced in Spark 3.1 
> `spark.sql.legacy.parquet.int96RebaseModeIn*` is the same as for 
> `datetimeRebaseModeIn*`?

Yes.

> So Spark will check the parquet metadata for Spark version and the 
> `datetimeRebaseModeInRead` metadata key and use the correct behavior.

Correct, except of names of metadata keys. Spark checks , see 
https://github.com/MaxGekk/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L58-L68

> If those are not set it will raise an exception and ask the user to define 
> the mode. Is that correct?

Yes. Spark should raise the exception if it is not clear which calendar the 
writer used.

> but from my testing Spark 3 does the same by default, not sure if that aligns 
> with your findings?

Spark 3.0.0-SNAPSHOT saved timestamps as TIMESTAMP_MICROS in parquet till 
https://github.com/apache/spark/pull/28450 . I just wanted to say that the 
configs datetimeRebaseModeIn* you pointed out don't impact on INT96 in Spark 
3.0.

> What is the expected behavior for TIMESTAMP_MICROS and TIMESTAMP_MILLIS with 
> regards to this?

The same as for DATE type. Spark takes into account the same SQL configs and 
metdata keys from parquet files.



> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
> Fix For: 3.1.0
>
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compared to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-reba

[jira] [Created] (SPARK-33742) Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions()

2020-12-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33742:
--

 Summary: Throw PartitionsAlreadyExistException from 
HiveExternalCatalog.createPartitions()
 Key: SPARK-33742
 URL: https://issues.apache.org/jira/browse/SPARK-33742
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.1, 2.4.7, 3.1.0
Reporter: Maxim Gekk


HiveExternalCatalog.createPartitions throws AlreadyExistsException wrapped by 
AnalysisException. The behavior deviates from V1/V2 in-memory catalogs that 
throw PartitionsAlreadyExistException.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33767:
--

 Summary: Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
 Key: SPARK-33767
 URL: https://issues.apache.org/jira/browse/SPARK-33767
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk
Assignee: Maxim Gekk
 Fix For: 3.1.0


Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for 
V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests

2020-12-12 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33767:
---
Description: Extract ALTER TABLE .. DROP PARTITION tests to the common 
place to run them for V1 and v2 datasources. Some tests can be places to V1 and 
V2 specific test suites.  (was: Extract ALTER TABLE .. ADD PARTITION tests to 
the common place to run them for V1 and v2 datasources. Some tests can be 
places to V1 and V2 specific test suites.)

> Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
> ---
>
> Key: SPARK-33767
> URL: https://issues.apache.org/jira/browse/SPARK-33767
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them 
> for V1 and v2 datasources. Some tests can be places to V1 and V2 specific 
> test suites.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition

2020-12-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33768:
--

 Summary: Remove unused parameter `retainData` from 
AlterTableDropPartition
 Key: SPARK-33768
 URL: https://issues.apache.org/jira/browse/SPARK-33768
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


The parameter is hard-coded to false while parsing in AstBuilder. The parameter 
can be removed from the logical node.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33770) Test failures: ALTER TABLE .. DROP PARTITION tries to delete files out of partition path

2020-12-13 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33770:
--

 Summary: Test failures: ALTER TABLE .. DROP PARTITION tries to 
delete files out of partition path
 Key: SPARK-33770
 URL: https://issues.apache.org/jira/browse/SPARK-33770
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


For example: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132719/testReport/org.apache.spark.sql.hive.execution.command/AlterTableAddPartitionSuite/ALTER_TABLEADD_PARTITION_Hive_V1__SPARK_33521__universal_type_conversions_of_partition_values/

{code:java}
org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER 
TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of 
partition values
sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-38fe2706-33e5-469a-ba3a-682391e02179
 does not exist;
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.dropPartitions(ExternalCatalogWithListener.scala:211)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.dropPartitions(SessionCatalog.scala:1036)
at 
org.apache.spark.sql.execution.command.AlterTableDropPartitionCommand.run(ddl.scala:582)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33777) Sort output of V2 SHOW PARTITIONS

2020-12-14 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33777:
---
Summary: Sort output of V2 SHOW PARTITIONS  (was: Sort output of SHOW 
PARTITIONS V2)

> Sort output of V2 SHOW PARTITIONS
> -
>
> Key: SPARK-33777
> URL: https://issues.apache.org/jira/browse/SPARK-33777
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> V1 SHOW PARTITIONS command sorts its results. Both V1 implementations 
> in-memory and Hive catalog (according to Hive docs 
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowPartitions)]
>  perform sorting. V2 should have the same behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33777) Sort output of SHOW PARTITIONS V2

2020-12-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33777:
--

 Summary: Sort output of SHOW PARTITIONS V2
 Key: SPARK-33777
 URL: https://issues.apache.org/jira/browse/SPARK-33777
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


V1 SHOW PARTITIONS command sorts its results. Both V1 implementations in-memory 
and Hive catalog (according to Hive docs 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowPartitions)]
 perform sorting. V2 should have the same behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   11   >