[jira] [Created] (SPARK-29920) Parsing failure on interval '20 15' day to hour
Maxim Gekk created SPARK-29920: -- Summary: Parsing failure on interval '20 15' day to hour Key: SPARK-29920 URL: https://issues.apache.org/jira/browse/SPARK-29920 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk {code:sql} spark-sql> select interval '20 15' day to hour; Error in query: requirement failed: Interval string must match day-time format of 'd h:m:s.n': 20 15(line 1, pos 16) == SQL == select interval '20 15' day to hour ^^^ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
Maxim Gekk created SPARK-29927: -- Summary: Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` Key: SPARK-29927 URL: https://issues.apache.org/jira/browse/SPARK-29927 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` functions uses SimpleDateFormat to parse strings to timestamps. SimpleDateFormat is able to parse only in millisecond precision if an user specified `SSS` in a pattern. The ticket aims to support parsing up to the microsecond precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
[ https://issues.apache.org/jira/browse/SPARK-29927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975697#comment-16975697 ] Maxim Gekk commented on SPARK-29927: [~cloud_fan] WDYT, does it make sense to change the functions as well? > Parse timestamps in microsecond precision by `to_timestamp`, > `to_unix_timestamp`, `unix_timestamp` > -- > > Key: SPARK-29927 > URL: https://issues.apache.org/jira/browse/SPARK-29927 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Major > > Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` > functions uses SimpleDateFormat to parse strings to timestamps. > SimpleDateFormat is able to parse only in millisecond precision if an user > specified `SSS` in a pattern. The ticket aims to support parsing up to the > microsecond precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources
[ https://issues.apache.org/jira/browse/SPARK-29904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29904: --- Affects Version/s: 2.4.0 2.4.1 2.4.2 2.4.3 > Parse timestamps in microsecond precision by JSON/CSV datasources > - > > Key: SPARK-29904 > URL: https://issues.apache.org/jira/browse/SPARK-29904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.5 > > > Currently, Spark can parse strings with timestamps from JSON/CSV in > millisecond precision. Internally, timestamps have microsecond precision. The > ticket aims to modify parsing logic in Spark 2.4 to support the microsecond > precision. Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview > is risky, so, need to find another lighter solution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29928) Check parsing timestamps up to microsecond precision by JSON/CSV datasource
Maxim Gekk created SPARK-29928: -- Summary: Check parsing timestamps up to microsecond precision by JSON/CSV datasource Key: SPARK-29928 URL: https://issues.apache.org/jira/browse/SPARK-29928 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Port tests added for 2.4 by the commit: https://github.com/apache/spark/commit/9c7e8be1dca8285296f3052c41f35043699d7d10 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0
Maxim Gekk created SPARK-29930: -- Summary: Remove SQL configs declared to be removed in Spark 3.0 Key: SPARK-29930 URL: https://issues.apache.org/jira/browse/SPARK-29930 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Need to remove the following SQL configs: * spark.sql.fromJsonForceNullableSchema * spark.sql.legacy.compareDateTimestampInTimestamp -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
Maxim Gekk created SPARK-29931: -- Summary: Declare all SQL legacy configs as will be removed in Spark 4.0 Key: SPARK-29931 URL: https://issues.apache.org/jira/browse/SPARK-29931 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Add the sentence to descriptions of all legacy SQL configs existed before Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of such configs: * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName * spark.sql.legacy.literal.pickMinimumPrecision * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation * spark.sql.legacy.sizeOfNull * spark.sql.legacy.replaceDatabricksSparkAvro.enabled * spark.sql.legacy.setopsPrecedence.enabled * spark.sql.legacy.integralDivide.returnBigint * spark.sql.legacy.bucketedTableScan.outputOrdering * spark.sql.legacy.parser.havingWithoutGroupByAsWhere * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue * spark.sql.legacy.setCommandRejectsSparkCoreConfs * spark.sql.legacy.utcTimestampFunc.enabled * spark.sql.legacy.typeCoercion.datetimeToString * spark.sql.legacy.looseUpcast * spark.sql.legacy.ctePrecedence.enabled * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
[ https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975813#comment-16975813 ] Maxim Gekk commented on SPARK-29931: [~rxin] [~lixiao] [~srowen] [~dongjoon] [~cloud_fan] [~hyukjin.kwon] Does this make sense for you? > Declare all SQL legacy configs as will be removed in Spark 4.0 > -- > > Key: SPARK-29931 > URL: https://issues.apache.org/jira/browse/SPARK-29931 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Add the sentence to descriptions of all legacy SQL configs existed before > Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of > such configs: > * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > * spark.sql.legacy.literal.pickMinimumPrecision > * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation > * spark.sql.legacy.sizeOfNull > * spark.sql.legacy.replaceDatabricksSparkAvro.enabled > * spark.sql.legacy.setopsPrecedence.enabled > * spark.sql.legacy.integralDivide.returnBigint > * spark.sql.legacy.bucketedTableScan.outputOrdering > * spark.sql.legacy.parser.havingWithoutGroupByAsWhere > * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue > * spark.sql.legacy.setCommandRejectsSparkCoreConfs > * spark.sql.legacy.utcTimestampFunc.enabled > * spark.sql.legacy.typeCoercion.datetimeToString > * spark.sql.legacy.looseUpcast > * spark.sql.legacy.ctePrecedence.enabled > * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
[ https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975944#comment-16975944 ] Maxim Gekk commented on SPARK-29931: > It's conceivable there could a reason to do it later, or sooner. Later is not problem what about sooner. Most of the configs were added for Spark 3.0. If you decide to remove one of them in a minor release between 3.0 and 4.0, you can break user apps that is unacceptable for minor releases, I do believe. > Declare all SQL legacy configs as will be removed in Spark 4.0 > -- > > Key: SPARK-29931 > URL: https://issues.apache.org/jira/browse/SPARK-29931 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Add the sentence to descriptions of all legacy SQL configs existed before > Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of > such configs: > * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > * spark.sql.legacy.literal.pickMinimumPrecision > * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation > * spark.sql.legacy.sizeOfNull > * spark.sql.legacy.replaceDatabricksSparkAvro.enabled > * spark.sql.legacy.setopsPrecedence.enabled > * spark.sql.legacy.integralDivide.returnBigint > * spark.sql.legacy.bucketedTableScan.outputOrdering > * spark.sql.legacy.parser.havingWithoutGroupByAsWhere > * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue > * spark.sql.legacy.setCommandRejectsSparkCoreConfs > * spark.sql.legacy.utcTimestampFunc.enabled > * spark.sql.legacy.typeCoercion.datetimeToString > * spark.sql.legacy.looseUpcast > * spark.sql.legacy.ctePrecedence.enabled > * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings
Maxim Gekk created SPARK-29933: -- Summary: ThriftServerQueryTestSuite runs tests with wrong settings Key: SPARK-29933 URL: https://issues.apache.org/jira/browse/SPARK-29933 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in the PostgreSQL dialect. See https://github.com/apache/spark/pull/26473#issuecomment-554510643 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings
[ https://issues.apache.org/jira/browse/SPARK-29933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29933: --- Attachment: filter_tests.patch > ThriftServerQueryTestSuite runs tests with wrong settings > - > > Key: SPARK-29933 > URL: https://issues.apache.org/jira/browse/SPARK-29933 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: filter_tests.patch > > > ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it > keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in > the PostgreSQL dialect. See > https://github.com/apache/spark/pull/26473#issuecomment-554510643 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976099#comment-16976099 ] Maxim Gekk commented on SPARK-29758: I have reproduced the issue on 2.4. The problem is in Jackson core 2.6.7. It was fixed by https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200#diff-7990edc67621822770cdc62e12d933d4R647-R650 in the version 2.7.7. We could try to back port this https://github.com/apache/spark/pull/21596 on 2.4. [~hyukjin.kwon] WDYT? > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {code} > val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", > "test":"$counterstring"}""").toDF("jso
[jira] [Commented] (SPARK-29575) from_json can produce nulls for fields which are marked as non-nullable
[ https://issues.apache.org/jira/browse/SPARK-29575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976102#comment-16976102 ] Maxim Gekk commented on SPARK-29575: This is intentional behavior. User's schema is forcibly set as nullable. See SPARK-23173 > from_json can produce nulls for fields which are marked as non-nullable > --- > > Key: SPARK-29575 > URL: https://issues.apache.org/jira/browse/SPARK-29575 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4 >Reporter: Victor Lopez >Priority: Major > > I believe this issue was resolved elsewhere > (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this > bug seems to still be there. > The issue appears when using {{from_json}} to parse a column in a Spark > dataframe. It seems like {{from_json}} ignores whether the schema provided > has any {{nullable:False}} property. > {code:java} > schema = T.StructType().add(T.StructField('id', T.LongType(), > nullable=False)).add(T.StructField('name', T.StringType(), nullable=False)) > data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': > 'jane'})}] > df = spark.read.json(sc.parallelize(data)) > df.withColumn("details", F.from_json("user", > schema)).select("details.*").show() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106 ] Maxim Gekk commented on SPARK-29758: Another solution is to remove this optimization: https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478 > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {code} > val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", > "test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > scala> json_tuple_result_with_prefix > res64: Int = 27
[jira] [Comment Edited] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106 ] Maxim Gekk edited comment on SPARK-29758 at 11/17/19 6:17 PM: -- Another solution is to disable this optimization: [https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478] was (Author: maxgekk): Another solution is to remove this optimization: https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478 > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {
[jira] [Created] (SPARK-29949) JSON/CSV formats timestamps incorrectly
Maxim Gekk created SPARK-29949: -- Summary: JSON/CSV formats timestamps incorrectly Key: SPARK-29949 URL: https://issues.apache.org/jira/browse/SPARK-29949 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk For example: {code} scala> val t = java.sql.Timestamp.valueOf("2019-11-18 11:56:00.123456") t: java.sql.Timestamp = 2019-11-18 11:56:00.123456 scala> Seq(t).toDF("t").select(to_json(struct($"t"), Map("timestampFormat" -> "-MM-dd HH:mm:ss.SS"))).show(false) +-+ |structstojson(named_struct(NamePlaceholder(), t))| +-+ |{"t":"2019-11-18 11:56:00.000123"} | +-+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29963) Check formatting timestamps up to microsecond precision by JSON/CSV datasource
Maxim Gekk created SPARK-29963: -- Summary: Check formatting timestamps up to microsecond precision by JSON/CSV datasource Key: SPARK-29963 URL: https://issues.apache.org/jira/browse/SPARK-29963 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Port tests added for 2.4 by the commit: https://github.com/apache/spark/commit/47cb1f359af62383e24198dbbaa0b4503348cd04 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30165) Eliminate compilation warnings
Maxim Gekk created SPARK-30165: -- Summary: Eliminate compilation warnings Key: SPARK-30165 URL: https://issues.apache.org/jira/browse/SPARK-30165 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.0.0 Reporter: Maxim Gekk This is an umbrella ticket for sub-tasks for eliminating compilation warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Attachment: spark_warnings.txt > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Description: This is an umbrella ticket for sub-tasks for eliminating compilation warnings. I dumped all warnings to the spark_warnings.txt file attached to the ticket. (was: This is an umbrella ticket for sub-tasks for eliminating compilation warnings. ) > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30166) Eliminate compilation warnings in JSONOptions
Maxim Gekk created SPARK-30166: -- Summary: Eliminate compilation warnings in JSONOptions Key: SPARK-30166 URL: https://issues.apache.org/jira/browse/SPARK-30166 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Scala 2.12 outputs the following warnings for JSONOptions: {code} sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, allowNumericLeadingZeros) Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, allowNonNumericNumbers) Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Component/s: (was: Build) SQL > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990925#comment-16990925 ] Maxim Gekk commented on SPARK-30165: [~aman_omer] Feel free to take a sub-set of warnings and create a sub-task to fix them. > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30168) Eliminate warnings in Parquet datasource
Maxim Gekk created SPARK-30168: -- Summary: Eliminate warnings in Parquet datasource Key: SPARK-30168 URL: https://issues.apache.org/jira/browse/SPARK-30168 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk # sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala {code} Warning:Warning:line (120)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] = { Warning:Warning:line (125)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. new org.apache.parquet.hadoop.ParquetInputSplit( Warning:Warning:line (134)method readFooter in class ParquetFileReader is deprecated: see corresponding Javadoc for more information. ParquetFileReader.readFooter(conf, filePath, SKIP_ROW_GROUPS).getFileMetaData Warning:Warning:line (183)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. split: ParquetInputSplit, Warning:Warning:line (212)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. split: ParquetInputSplit, {code} # sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java {code} Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (97)java: getRowGroupOffsets() in org.apache.parquet.hadoop.ParquetInputSplit has been deprecated Warning:Warning:line (105)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (108)java: filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType) in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated Warning:Warning:line (111)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (147)java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (203)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (226)java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated {code} # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30166) Eliminate warnings in JSONOptions
[ https://issues.apache.org/jira/browse/SPARK-30166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30166: --- Summary: Eliminate warnings in JSONOptions (was: Eliminate compilation warnings in JSONOptions) > Eliminate warnings in JSONOptions > - > > Key: SPARK-30166 > URL: https://issues.apache.org/jira/browse/SPARK-30166 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Scala 2.12 outputs the following warnings for JSONOptions: > {code} > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala > Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, > allowNumericLeadingZeros) > Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, > allowNonNumericNumbers) > Warning:Warning:line (139)Java enum > ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: > see corresponding Javadoc for more information. > > factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, > Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, > allowUnquotedControlChars) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30169) Eliminate warnings in Kafka connector
Maxim Gekk created SPARK-30169: -- Summary: Eliminate warnings in Kafka connector Key: SPARK-30169 URL: https://issues.apache.org/jira/browse/SPARK-30169 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Eliminate compilation warnings in the files: {code} external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30170) Eliminate warnings: part 1
Maxim Gekk created SPARK-30170: -- Summary: Eliminate warnings: part 1 Key: SPARK-30170 URL: https://issues.apache.org/jira/browse/SPARK-30170 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Eliminate compilation warnings in: # StopWordsRemoverSuite {code} Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => {code} # MLTest.scala {code} Warning:Warning:line (88)match may not be exhaustive. It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute val n = Attribute.fromStructField(dataframe.schema(colName)) match { {code} # FloatType.scala {code} Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue {code} # AnalysisExternalCatalogSuite.scala {code} Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is deprecated: see corresponding Javadoc for more information. verifyZeroInteractions(catalog) {code} # CSVExprUtilsSuite.scala {code} Warning:Warning:line (81)Octal escape literals are deprecated, use \u instead. ("\0", Some("\u"), None) {code} # CollectionExpressionsSuite.scala, ashExpressionsSuite.scala, ExpressionParserSuite.scala {code} Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be enabled by making the implicit value scala.language.implicitConversions visible. This can be achieved by adding the import clause 'import scala.language.implicitConversions' or by setting the compiler option -language:implicitConversions. See the
[jira] [Commented] (SPARK-30170) Eliminate warnings: part 1
[ https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990989#comment-16990989 ] Maxim Gekk commented on SPARK-30170: I am working on this > Eliminate warnings: part 1 > -- > > Key: SPARK-30170 > URL: https://issues.apache.org/jira/browse/SPARK-30170 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Eliminate compilation warnings in: > # StopWordsRemoverSuite > {code} > Warning:Warning:line (245)non-variable type argument String in type pattern > Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > {code} > # MLTest.scala > {code} > Warning:Warning:line (88)match may not be exhaustive. > It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute > val n = Attribute.fromStructField(dataframe.schema(colName)) match { > {code} > # FloatType.scala > {code} > Warning:Warning:line (81)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def quot(x: Float, y: Float): Float = (BigDecimal(x) quot > BigDecimal(y)).floatValue > Warning:Warning:line (81)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def quot(x: Float, y: Float): Float = (BigDecimal(x) quot > BigDecimal(y)).floatValue > Warning:Warning:line (82)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder > BigDecimal(y)).floatValue > Warning:Warning:line (82)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder > BigDecimal(y)).floatValue > {code} > # AnalysisExternalCatalogSuite.scala > {code} > Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is > deprecated: see corresponding Javadoc for more information. > verifyZeroInteractions(catalog) > {code} > # CSVExprUtilsSuite.scala > {code} > Warning:Warning:line (81)Octal escape literals are deprecated, use \u > instead. > ("\0", Some("\u"), None) > {c
[jira] [Updated] (SPARK-30170) Eliminate warnings: part 1
[ https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30170: --- Description: Eliminate compilation warnings in: # StopWordsRemoverSuite {code:java} Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => {code} # MLTest.scala {code:java} Warning:Warning:line (88)match may not be exhaustive. It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute val n = Attribute.fromStructField(dataframe.schema(colName)) match { {code} # FloatType.scala {code:java} Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue {code} # AnalysisExternalCatalogSuite.scala {code:java} Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is deprecated: see corresponding Javadoc for more information. verifyZeroInteractions(catalog) {code} # CSVExprUtilsSuite.scala {code:java} Warning:Warning:line (81)Octal escape literals are deprecated, use \u instead. ("\0", Some("\u"), None) {code} # CollectionExpressionsSuite.scala, HashExpressionsSuite.scala, ExpressionParserSuite.scala {code:java} Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be enabled by making the implicit value scala.language.implicitConversions visible. This can be achieved by adding the import clause 'import scala.language.implicitConversions' or by setting the compiler option -language:implicitConversions. See the Scaladoc for value scala.language.implicitConversions for a discussion why the feature should be explicitly enabled.
[jira] [Commented] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994806#comment-16994806 ] Maxim Gekk commented on SPARK-30165: > Are you sure on these? I am almost sure we can fix Parquet and Kafka related warnings. Not sure about warnings coming from deprecated Spark API. Maybe it is possible to suppress such warnings in tests. In any case, we know in advance that we test deprecated API. Such warnings don't guard us from mistakes. I quickly googled and found this [https://github.com/scala/bug/issues/7934#issuecomment-292425679] . Maybe we can use the approach in tests. > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30168) Eliminate warnings in Parquet datasource
[ https://issues.apache.org/jira/browse/SPARK-30168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995754#comment-16995754 ] Maxim Gekk commented on SPARK-30168: [~Ankitraj] Go ahead. > Eliminate warnings in Parquet datasource > > > Key: SPARK-30168 > URL: https://issues.apache.org/jira/browse/SPARK-30168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > # > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala > {code} > Warning:Warning:line (120)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] > = { > Warning:Warning:line (125)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > new org.apache.parquet.hadoop.ParquetInputSplit( > Warning:Warning:line (134)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter(conf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > Warning:Warning:line (183)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > split: ParquetInputSplit, > Warning:Warning:line (212)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > split: ParquetInputSplit, > {code} > # > sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java > {code} > Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in > org.apache.parquet.hadoop has been deprecated > Warning:Warning:line (95)java: > org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has > been deprecated > Warning:Warning:line (95)java: > org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has > been deprecated > Warning:Warning:line (97)java: getRowGroupOffsets() in > org.apache.parquet.hadoop.ParquetInputSplit has been deprecated > Warning:Warning:line (105)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (108)java: > filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType) > in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated > Warning:Warning:line (111)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (147)java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (203)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (226)java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > {code} > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala > # > sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30258) Eliminate warnings of depracted Spark APIs in tests
Maxim Gekk created SPARK-30258: -- Summary: Eliminate warnings of depracted Spark APIs in tests Key: SPARK-30258 URL: https://issues.apache.org/jira/browse/SPARK-30258 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.0.0 Reporter: Maxim Gekk Suppress deprecation warnings in tests that check deprecated Spark APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30258) Eliminate warnings of deprecated Spark APIs in tests
[ https://issues.apache.org/jira/browse/SPARK-30258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30258: --- Summary: Eliminate warnings of deprecated Spark APIs in tests (was: Eliminate warnings of depracted Spark APIs in tests) > Eliminate warnings of deprecated Spark APIs in tests > > > Key: SPARK-30258 > URL: https://issues.apache.org/jira/browse/SPARK-30258 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Suppress deprecation warnings in tests that check deprecated Spark APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30309) Mark `Filter` as a `sealed` class
Maxim Gekk created SPARK-30309: -- Summary: Mark `Filter` as a `sealed` class Key: SPARK-30309 URL: https://issues.apache.org/jira/browse/SPARK-30309 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Add the `sealed` keyword to the `Filter` class at the `org.apache.spark.sql.sources` package. So, the compiler should output a warning if handling of a filter is missed in a datasource: {code} Warning:(154, 65) match may not be exhaustive. It would fail on the following inputs: AlwaysFalse(), AlwaysTrue() def translate(filter: sources.Filter): Option[Expression] = filter match { {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30323) Support filters pushdown in CSV datasource
Maxim Gekk created SPARK-30323: -- Summary: Support filters pushdown in CSV datasource Key: SPARK-30323 URL: https://issues.apache.org/jira/browse/SPARK-30323 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk - Implement the `SupportsPushDownFilters` interface in `CSVScanBuilder` - Apply filters in UnivocityParser - Change API UnivocityParser - return Seq[InternalRow] from `convert()` - Update CSVBenchmark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30401) Call requireNonStaticConf() only once
Maxim Gekk created SPARK-30401: -- Summary: Call requireNonStaticConf() only once Key: SPARK-30401 URL: https://issues.apache.org/jira/browse/SPARK-30401 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the same input: 1. Inside of set(, true) 2. set() converts the second argument to a string and calls set(, "true") where requireNonStaticConf() is invoked one more time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30401) Call requireNonStaticConf() only once
[ https://issues.apache.org/jira/browse/SPARK-30401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006392#comment-17006392 ] Maxim Gekk commented on SPARK-30401: I am working on it > Call requireNonStaticConf() only once > - > > Key: SPARK-30401 > URL: https://issues.apache.org/jira/browse/SPARK-30401 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Trivial > > The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the > same input: > 1. Inside of set(, true) > 2. set() converts the second argument to a string and calls set(, > "true") where requireNonStaticConf() is invoked one more time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30409) Use `NoOp` datasource in SQL benchmarks
Maxim Gekk created SPARK-30409: -- Summary: Use `NoOp` datasource in SQL benchmarks Key: SPARK-30409 URL: https://issues.apache.org/jira/browse/SPARK-30409 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, SQL benchmarks use `count()`, `collect()` and `foreach(_ => ())` actions. The actions have additional overhead. For example, `collect()` converts column values to external type values and pull data on the driver. Need to unify benchmark and the `NoOp` datasource except the benchmarks for `count()` or `collect()` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30171) Eliminate warnings: part2
[ https://issues.apache.org/jira/browse/SPARK-30171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006949#comment-17006949 ] Maxim Gekk commented on SPARK-30171: [~srowen] SPARK-30258 fixes warnings AvroFunctionsSuite.scala but not in parsedOptions.ignoreExtension . I am not sure how we can avoid warnings related to ignoreExtension. > Eliminate warnings: part2 > - > > Key: SPARK-30171 > URL: https://issues.apache.org/jira/browse/SPARK-30171 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > AvroFunctionsSuite.scala > Warning:Warning:line (41)method to_avro in package avro is deprecated (since > 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' instead. > val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b")) > Warning:Warning:line (41)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b")) > Warning:Warning:line (54)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, > avroTypeStr)), df) > Warning:Warning:line (54)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, > avroTypeStr)), df) > Warning:Warning:line (59)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroStructDF = df.select(to_avro('struct).as("avro")) > Warning:Warning:line (70)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df) > Warning:Warning:line (76)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroStructDF = df.select(to_avro('struct).as("avro")) > Warning:Warning:line (118)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val readBackOne = dfOne.select(to_avro($"array").as("avro")) > Warning:Warning:line (119)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > .select(from_avro($"avro", avroTypeArrStruct).as("array")) > AvroPartitionReaderFactory.scala > Warning:Warning:line (64)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > if (parsedOptions.ignoreExtension || > partitionedFile.filePath.endsWith(".avro")) { > AvroFileFormat.scala > Warning:Warning:line (98)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > if (parsedOptions.ignoreExtension || file.filePath.endsWith(".avro")) { > AvroUtils.scala > Warning:Warning:line (55)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > inferAvroSchemaFromFiles(files, conf, parsedOptions.ignoreExtension, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30172) Eliminate warnings: part3
[ https://issues.apache.org/jira/browse/SPARK-30172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006952#comment-17006952 ] Maxim Gekk commented on SPARK-30172: [~Ankitraj] Are you still working on this? > Eliminate warnings: part3 > - > > Key: SPARK-30172 > URL: https://issues.apache.org/jira/browse/SPARK-30172 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > /sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala > Warning:Warning:line (422)method initialize in class AbstractSerDe is > deprecated: see corresponding Javadoc for more information. > serde.initialize(null, properties) > /sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala > Warning:Warning:line (216)method initialize in class GenericUDTF is > deprecated: see corresponding Javadoc for more information. > protected lazy val outputInspector = > function.initialize(inputInspectors.toArray) > Warning:Warning:line (342)class UDAF in package exec is deprecated: see > corresponding Javadoc for more information. > new GenericUDAFBridge(funcWrapper.createFunction[UDAF]()) > Warning:Warning:line (503)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > def serialize(buffer: AggregationBuffer): Array[Byte] = { > Warning:Warning:line (523)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > def deserialize(bytes: Array[Byte]): AggregationBuffer = { > Warning:Warning:line (538)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean) > Warning:Warning:line (538)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean) > /sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java > Warning:Warning:line (44)java: getTypes() in org.apache.orc.Reader has > been deprecated > Warning:Warning:line (47)java: getTypes() in org.apache.orc.Reader has > been deprecated > /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala > Warning:Warning:line (2,368)method readFooter in class ParquetFileReader > is deprecated: see corresponding Javadoc for more information. > val footer = ParquetFileReader.readFooter( > /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala > Warning:Warning:line (202)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def getNewAggregationBuffer: AggregationBuffer = new > MockUDAFBuffer(0L, 0L) > Warning:Warning:line (204)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def reset(agg: AggregationBuffer): Unit = { > Warning:Warning:line (212)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def iterate(agg: AggregationBuffer, parameters: Array[AnyRef]): > Unit = { > Warning:Warning:line (221)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def merge(agg: AggregationBuffer, partial: Object): Unit = { > Warning:Warning:line (231)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def terminatePartial(agg: AggregationBuffer): AnyRef = { > Warning:Warning:line (236)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def terminate(agg: AggregationBuffer): AnyRef = > terminatePartial(agg) > Warning:Warning:line (257)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def getNewAggregationBuffer: AggregationBuffer = { > Warning:Warning:line (266)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def reset(agg: AggregationBuffer): Unit = { > Warning:Warning:line (277)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def iterate(agg: AggregationBuffer, parameters: Arr
[jira] [Commented] (SPARK-30174) Eliminate warnings :part 4
[ https://issues.apache.org/jira/browse/SPARK-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006953#comment-17006953 ] Maxim Gekk commented on SPARK-30174: [~shivuson...@gmail.com] Are you still working on this? If so, could you write in the ticket how are going to fix the warnings, please. > Eliminate warnings :part 4 > -- > > Key: SPARK-30174 > URL: https://issues.apache.org/jira/browse/SPARK-30174 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jobit mathew >Priority: Minor > > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala > {code:java} > Warning:Warning:line (127)value ENABLE_JOB_SUMMARY in class > ParquetOutputFormat is deprecated: see corresponding Javadoc for more > information. > && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) { > Warning:Warning:line (261)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > new org.apache.parquet.hadoop.ParquetInputSplit( > Warning:Warning:line (272)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter(sharedConf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > Warning:Warning:line (442)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter( > {code} > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala > {code:java} > Warning:Warning:line (91)value ENABLE_JOB_SUMMARY in class > ParquetOutputFormat is deprecated: see corresponding Javadoc for more > information. > && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) { > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30412) Eliminate warnings in Java tests regarding to deprecated API
Maxim Gekk created SPARK-30412: -- Summary: Eliminate warnings in Java tests regarding to deprecated API Key: SPARK-30412 URL: https://issues.apache.org/jira/browse/SPARK-30412 Project: Spark Issue Type: Sub-task Components: Java API, SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Suppress warnings about deprecated Spark API in Java test suites: {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java Warning:Warning:line (32)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (91)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (100)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (109)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (118)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated {code} {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/Java8DatasetAggregatorSuite.java Warning:Warning:line (28)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (37)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (46)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (55)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (64)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated {code} {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java Warning:Warning:line (478)java: json(org.apache.spark.api.java.JavaRDD) in org.apache.spark.sql.DataFrameReader has been deprecated {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33381) Unify DSv1 and DSv2 command tests
[ https://issues.apache.org/jira/browse/SPARK-33381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33381: --- Summary: Unify DSv1 and DSv2 command tests (was: Unify dsv1 and dsv2 command tests) > Unify DSv1 and DSv2 command tests > - > > Key: SPARK-33381 > URL: https://issues.apache.org/jira/browse/SPARK-33381 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW > TABLES and etc. Put datasource specific tests to separate test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33381) Unify dsv1 and dsv2 command tests
Maxim Gekk created SPARK-33381: -- Summary: Unify dsv1 and dsv2 command tests Key: SPARK-33381 URL: https://issues.apache.org/jira/browse/SPARK-33381 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Create unified test suites for DSv1 and DSv2 commands like CREATE TABLE, SHOW TABLES and etc. Put datasource specific tests to separate test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33382) Unify v1 and v2 SHOW TABLES tests
Maxim Gekk created SPARK-33382: -- Summary: Unify v1 and v2 SHOW TABLES tests Key: SPARK-33382 URL: https://issues.apache.org/jira/browse/SPARK-33382 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Gather common tests for DSv1 and DSv2 SHOW TABLES command to a common test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33392) Align DSv2 commands to DSv1 implementation
Maxim Gekk created SPARK-33392: -- Summary: Align DSv2 commands to DSv1 implementation Key: SPARK-33392 URL: https://issues.apache.org/jira/browse/SPARK-33392 Project: Spark Issue Type: Umbrella Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk The purpose of this umbrella ticket is: # Implement missing features of datasource v1 commands in DSv2 # Align behavior of DSv2 commands to the current implementation of DSv1 commands as much as possible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33305) DSv2: DROP TABLE command should also invalidate cache
[ https://issues.apache.org/jira/browse/SPARK-33305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33305: --- Parent: SPARK-33392 Issue Type: Sub-task (was: Bug) > DSv2: DROP TABLE command should also invalidate cache > - > > Key: SPARK-33305 > URL: https://issues.apache.org/jira/browse/SPARK-33305 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.1 >Reporter: Chao Sun >Priority: Major > > Different from DSv1, {{DROP TABLE}} command in DSv2 currently only drops the > table but doesn't invalidate all caches referencing the table. We should make > the behavior consistent between v1 and v2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33364) Expose purge option in TableCatalog.dropTable
[ https://issues.apache.org/jira/browse/SPARK-33364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33364: --- Parent: SPARK-33392 Issue Type: Sub-task (was: New Feature) > Expose purge option in TableCatalog.dropTable > - > > Key: SPARK-33364 > URL: https://issues.apache.org/jira/browse/SPARK-33364 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.1.0 > > > TableCatalog.dropTable currently does not support the purge option. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2
Maxim Gekk created SPARK-33393: -- Summary: Support SHOW TABLE EXTENDED in DSv2 Key: SPARK-33393 URL: https://issues.apache.org/jira/browse/SPARK-33393 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode in: https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33 which is supported in DSv1: https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870 Need to add the same functionality to ShowTablesExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33394) Throw `NoSuchDatabaseException` for not existing namespace in DSv2 SHOW TABLES
Maxim Gekk created SPARK-33394: -- Summary: Throw `NoSuchDatabaseException` for not existing namespace in DSv2 SHOW TABLES Key: SPARK-33394 URL: https://issues.apache.org/jira/browse/SPARK-33394 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Current implementation of DSv2 SHOW TABLES return an empty result for not existing database/namespace. This implementation should be aligned to DSv1 which throws the `NoSuchDatabaseException` exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33403) DSv2 SHOW TABLES doesn't show `default`
Maxim Gekk created SPARK-33403: -- Summary: DSv2 SHOW TABLES doesn't show `default` Key: SPARK-33403 URL: https://issues.apache.org/jira/browse/SPARK-33403 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk DSv1: {code:scala} test("namespace is not specified and the default catalog is set") { withSQLConf(SQLConf.DEFAULT_CATALOG.key -> catalog) { withTable("table") { spark.sql(s"CREATE TABLE table (id bigint, data string) $defaultUsing") sql("SHOW TABLES").show() } } } {code} {code} ++-+---+ |database|tableName|isTemporary| ++-+---+ | default|table| false| ++-+---+ {code} DSv2: {code} +-+-+ |namespace|tableName| +-+-+ | |table| +-+-+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33426) Unify Hive SHOW TABLES tests
Maxim Gekk created SPARK-33426: -- Summary: Unify Hive SHOW TABLES tests Key: SPARK-33426 URL: https://issues.apache.org/jira/browse/SPARK-33426 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk 1. Move Hive SHOW TABLES tests to a separate test suite 2. Extend the common SHOW TABLES trait by the new test suite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog
Maxim Gekk created SPARK-33430: -- Summary: Support namespaces in JDBC v2 Table Catalog Key: SPARK-33430 URL: https://issues.apache.org/jira/browse/SPARK-33430 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk When I extend JDBCTableCatalogSuite by org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance: {code:scala} import org.apache.spark.sql.execution.command.v2.ShowTablesSuite class JDBCTableCatalogSuite extends ShowTablesSuite { override def version: String = "JDBC V2" override def catalog: String = "h2" ... {code} some tests from JDBCTableCatalogSuite fail with: {code} [info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 seconds, 502 milliseconds) [info] org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does not support namespaces; [info] at org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83) [info] at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208) [info] at org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog
[ https://issues.apache.org/jira/browse/SPARK-33430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17230514#comment-17230514 ] Maxim Gekk commented on SPARK-33430: [~cloud_fan] [~huaxingao] It would be nice to support namespaces, WDYT? > Support namespaces in JDBC v2 Table Catalog > --- > > Key: SPARK-33430 > URL: https://issues.apache.org/jira/browse/SPARK-33430 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > When I extend JDBCTableCatalogSuite by > org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance: > {code:scala} > import org.apache.spark.sql.execution.command.v2.ShowTablesSuite > class JDBCTableCatalogSuite extends ShowTablesSuite { > override def version: String = "JDBC V2" > override def catalog: String = "h2" > ... > {code} > some tests from JDBCTableCatalogSuite fail with: > {code} > [info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 > seconds, 502 milliseconds) > [info] org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does > not support namespaces; > [info] at > org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83) > [info] at > org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208) > [info] at > org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33393) Support SHOW TABLE EXTENDED in DSv2
[ https://issues.apache.org/jira/browse/SPARK-33393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232094#comment-17232094 ] Maxim Gekk commented on SPARK-33393: I plan to work on this soon. > Support SHOW TABLE EXTENDED in DSv2 > --- > > Key: SPARK-33393 > URL: https://issues.apache.org/jira/browse/SPARK-33393 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Current implementation of DSv2 SHOW TABLE doesn't support the EXTENDED mode > in: > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ShowTablesExec.scala#L33 > which is supported in DSv1: > https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L870 > Need to add the same functionality to ShowTablesExec. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node
[ https://issues.apache.org/jira/browse/SPARK-33452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232095#comment-17232095 ] Maxim Gekk commented on SPARK-33452: I plan to work on this soon. > Create a V2 SHOW PARTITIONS execution node > -- > > Key: SPARK-33452 > URL: https://issues.apache.org/jira/browse/SPARK-33452 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > There is the V1 SHOW PARTITIONS implementation: > https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975 > The ticket aims to add V2 implementation with similar behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33452) Create a V2 SHOW PARTITIONS execution node
Maxim Gekk created SPARK-33452: -- Summary: Create a V2 SHOW PARTITIONS execution node Key: SPARK-33452 URL: https://issues.apache.org/jira/browse/SPARK-33452 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk There is the V1 SHOW PARTITIONS implementation: https://github.com/apache/spark/blob/7e99fcd64efa425f3c985df4fe957a3be274a49a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L975 The ticket aims to add V2 implementation with similar behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33453) Unify v1 and v2 SHOW PARTITIONS tests
Maxim Gekk created SPARK-33453: -- Summary: Unify v1 and v2 SHOW PARTITIONS tests Key: SPARK-33453 URL: https://issues.apache.org/jira/browse/SPARK-33453 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Gather common tests for DSv1 and DSv2 SHOW PARTITIONS command to a common test. Mix this trait to datasource specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33505) Fix insert into `InMemoryPartitionTable`
Maxim Gekk created SPARK-33505: -- Summary: Fix insert into `InMemoryPartitionTable` Key: SPARK-33505 URL: https://issues.apache.org/jira/browse/SPARK-33505 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't create partitions. The example below demonstrates the issue: {code:scala} test("insert into partitioned table") { val t = "testpart.ns1.ns2.tbl" withTable(t) { spark.sql( s""" |CREATE TABLE $t (id bigint, name string, data string) |USING foo |PARTITIONED BY (id, name)""".stripMargin) spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 'abc'") val partTable = catalog("testpart").asTableCatalog .loadTable(Identifier.of(Array("ns1", "ns2"), "tbl")).asInstanceOf[InMemoryPartitionTable] assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, UTF8String.fromString("Max") } } {code} The partitionExists() function return false for the partitions that must be created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33509) List partition by names from V2 tables that support partition management
Maxim Gekk created SPARK-33509: -- Summary: List partition by names from V2 tables that support partition management Key: SPARK-33509 URL: https://issues.apache.org/jira/browse/SPARK-33509 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Currently, the SupportsPartitionManagement interface exposes only the listPartitionIdentifiers() method which does not allow to list partition by names. So, it is hard to implement: {code:java} SHOW PARTITIONS table PARTITION(month=2) {code} from the table like: {code:java} CREATE TABLE $table (price int, qty int, year int, month int) USING parquet partitioned by (year, month) {code} because listPartitionIdentifiers() requires to specify value for *year* . -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33511) Respect case sensitivity in resolving partition specs V2
Maxim Gekk created SPARK-33511: -- Summary: Respect case sensitivity in resolving partition specs V2 Key: SPARK-33511 URL: https://issues.apache.org/jira/browse/SPARK-33511 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk DSv1 DDL commands respect the SQL config spark.sql.caseSensitive, for example {code:java} spark-sql> CREATE TABLE tbl1 (id bigint, data string) USING parquet PARTITIONED BY (id); spark-sql> ALTER TABLE tbl1 ADD PARTITION (ID=1); spark-sql> SHOW PARTITIONS tbl1; id=1 {code} but the same ALTER TABLE command fails on DSv2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33521) Universal type conversion of V2 partition values
Maxim Gekk created SPARK-33521: -- Summary: Universal type conversion of V2 partition values Key: SPARK-33521 URL: https://issues.apache.org/jira/browse/SPARK-33521 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Support other types while resolving partition specs in https://github.com/apache/spark/blob/23e9920b3910e4f05269853429c7f1cdc7b5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolvePartitionSpec.scala#L72 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33529) Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__
Maxim Gekk created SPARK-33529: -- Summary: Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__ Key: SPARK-33529 URL: https://issues.apache.org/jira/browse/SPARK-33529 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - the same as DSv1 does. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33529) Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__
[ https://issues.apache.org/jira/browse/SPARK-33529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33529: --- Description: The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - the same as DSv1 does. For example in DSv1: {code:java} spark-sql> CREATE TABLE tbl11 (id int, part0 string) USING parquet PARTITIONED BY (part0); spark-sql> ALTER TABLE tbl11 ADD PARTITION (part0 = '__HIVE_DEFAULT_PARTITION__'); spark-sql> INSERT INTO tbl11 PARTITION (part0='__HIVE_DEFAULT_PARTITION__') SELECT 1; spark-sql> SELECT * FROM tbl11; 1 NULL {code} was:The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - the same as DSv1 does. > Resolver of V2 partition specs doesn't handle __HIVE_DEFAULT_PARTITION__ > > > Key: SPARK-33529 > URL: https://issues.apache.org/jira/browse/SPARK-33529 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > The partition value '__HIVE_DEFAULT_PARTITION__' should be handled as null - > the same as DSv1 does. > For example in DSv1: > {code:java} > spark-sql> CREATE TABLE tbl11 (id int, part0 string) USING parquet > PARTITIONED BY (part0); > spark-sql> ALTER TABLE tbl11 ADD PARTITION (part0 = > '__HIVE_DEFAULT_PARTITION__'); > spark-sql> INSERT INTO tbl11 PARTITION (part0='__HIVE_DEFAULT_PARTITION__') > SELECT 1; > spark-sql> SELECT * FROM tbl11; > 1 NULL > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. PARTITION tests
Maxim Gekk created SPARK-33558: -- Summary: Unify v1 and v2 ALTER TABLE .. PARTITION tests Key: SPARK-33558 URL: https://issues.apache.org/jira/browse/SPARK-33558 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33569) Remove getting partitions by only ident
Maxim Gekk created SPARK-33569: -- Summary: Remove getting partitions by only ident Key: SPARK-33569 URL: https://issues.apache.org/jira/browse/SPARK-33569 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk This is a follow up of SPARK-33509 which added a function for getting partitions by names and ident. The function which gets partitions by ident is not used anymore, and it can be removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column
Maxim Gekk created SPARK-33585: -- Summary: The comment for SQLContext.tables() doesn't mention the `database` column Key: SPARK-33585 URL: https://issues.apache.org/jira/browse/SPARK-33585 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.0.1, 2.4.7, 3.1.0 Reporter: Maxim Gekk The comment says: "The returned DataFrame has two columns, tableName and isTemporary": https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664 but actually the dataframe has 3 columns: {code:scala} scala> spark.range(10).createOrReplaceTempView("view1") scala> val tables = spark.sqlContext.tables() tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string ... 1 more field] scala> tables.printSchema root |-- database: string (nullable = false) |-- tableName: string (nullable = false) |-- isTemporary: boolean (nullable = false) scala> tables.show ++-+---+ |database|tableName|isTemporary| ++-+---+ | default| t1| false| | default| t2| false| | default| ymd| false| ||view1| true| ++-+---+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33588) Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive`
Maxim Gekk created SPARK-33588: -- Summary: Partition spec in SHOW TABLE EXTENDED doesn't respect `spark.sql.caseSensitive` Key: SPARK-33588 URL: https://issues.apache.org/jira/browse/SPARK-33588 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.8, 3.0.2, 3.1.0 Reporter: Maxim Gekk For example: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > partitioned by (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW TABLE EXTENDED LIKE 'tbl1' PARTITION(YEAR = 2015, Month = 1); Error in query: Partition spec is invalid. The spec (YEAR, Month) must match the partition spec (year, month) defined in table '`default`.`tbl1`'; {code} The spark.sql.caseSensitive flag is false by default, so, the partition spec is valid. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33585) The comment for SQLContext.tables() doesn't mention the `database` column
[ https://issues.apache.org/jira/browse/SPARK-33585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33585: --- Affects Version/s: (was: 2.4.7) (was: 3.0.1) 3.0.2 2.4.8 > The comment for SQLContext.tables() doesn't mention the `database` column > - > > Key: SPARK-33585 > URL: https://issues.apache.org/jira/browse/SPARK-33585 > Project: Spark > Issue Type: Documentation > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Maxim Gekk >Priority: Minor > > The comment says: "The returned DataFrame has two columns, tableName and > isTemporary": > https://github.com/apache/spark/blob/b26ae98407c6c017a4061c0c420f48685ddd6163/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L664 > but actually the dataframe has 3 columns: > {code:scala} > scala> spark.range(10).createOrReplaceTempView("view1") > scala> val tables = spark.sqlContext.tables() > tables: org.apache.spark.sql.DataFrame = [database: string, tableName: string > ... 1 more field] > scala> tables.printSchema > root > |-- database: string (nullable = false) > |-- tableName: string (nullable = false) > |-- isTemporary: boolean (nullable = false) > scala> tables.show > ++-+---+ > |database|tableName|isTemporary| > ++-+---+ > | default| t1| false| > | default| t2| false| > | default| ymd| false| > ||view1| true| > ++-+---+ > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33591) NULL is recognized as the "null" string in partition specs
Maxim Gekk created SPARK-33591: -- Summary: NULL is recognized as the "null" string in partition specs Key: SPARK-33591 URL: https://issues.apache.org/jira/browse/SPARK-33591 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk For example: {code:sql} spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED BY (p1); spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; spark-sql> SELECT isnull(p1) FROM tbl5; false {code} The *p1 = null* is not recognized as a partition with NULL value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs
[ https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33591: --- Issue Type: Improvement (was: Bug) > NULL is recognized as the "null" string in partition specs > -- > > Key: SPARK-33591 > URL: https://issues.apache.org/jira/browse/SPARK-33591 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > For example: > {code:sql} > spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED > BY (p1); > spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; > spark-sql> SELECT isnull(p1) FROM tbl5; > false > {code} > The *p1 = null* is not recognized as a partition with NULL value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33591) NULL is recognized as the "null" string in partition specs
[ https://issues.apache.org/jira/browse/SPARK-33591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33591: --- Issue Type: Bug (was: Improvement) > NULL is recognized as the "null" string in partition specs > -- > > Key: SPARK-33591 > URL: https://issues.apache.org/jira/browse/SPARK-33591 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > For example: > {code:sql} > spark-sql> CREATE TABLE tbl5 (col1 INT, p1 STRING) USING PARQUET PARTITIONED > BY (p1); > spark-sql> INSERT INTO TABLE tbl5 PARTITION (p1 = null) SELECT 0; > spark-sql> SELECT isnull(p1) FROM tbl5; > false > {code} > The *p1 = null* is not recognized as a partition with NULL value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241339#comment-17241339 ] Maxim Gekk commented on SPARK-33571: [~simonvanderveldt] Thank you for the detailed description and your investigation. Let me clarify a few things: > From our testing we're seeing several issues: > Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that > contains fields of type `TimestampType` which contain timestamps before the > above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 Spark 2.4.5 writes timestamps as parquet INT96 type. The SQL config `datetimeRebaseModeInRead` does not influence on reading such types in Spark 3.0.1, so, Spark performs rebasing always (LEGACY mode). We recently added separate configs for INT96: * https://github.com/apache/spark/pull/30056 * https://github.com/apache/spark/pull/30121 The changes will be released with Spark 3.1.0. > Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that > contains fields of type `TimestampType` or `DateType` which contain dates or > timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. For INT96, it seems it is correct behavior. We should observe different results for TIMESTAMP_MICROS and TIMESTAMP_MILLIS types, see the SQL config spark.sql.parquet.outputTimestampType. The DATE case is more interesting as we must see a difference in results for ancient dates. I will investigate this case. > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241379#comment-17241379 ] Maxim Gekk commented on SPARK-33571: I have tried to reproduce the issue on the master branch by reading the file saved by Spark 2.4.5 (https://github.com/apache/spark/tree/master/sql/core/src/test/resources/test-data): {code:scala} test("SPARK-33571: read ancient dates saved by Spark 2.4.5") { withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_READ.key -> LEGACY.toString) { val path = getResourceParquetFilePath("test-data/before_1582_date_v2_4_5.snappy.parquet") val df = spark.read.parquet(path) df.show(false) } withSQLConf(SQLConf.LEGACY_PARQUET_REBASE_MODE_IN_READ.key -> CORRECTED.toString) { val path = getResourceParquetFilePath("test-data/before_1582_date_v2_4_5.snappy.parquet") val df = spark.read.parquet(path) df.show(false) } } {code} The results are different in LEGACY and in CORRECTED modes: {code} +--+--+ |dict |plain | +--+--+ |1001-01-01|1001-01-01| |1001-01-01|1001-01-02| |1001-01-01|1001-01-03| |1001-01-01|1001-01-04| |1001-01-01|1001-01-05| |1001-01-01|1001-01-06| |1001-01-01|1001-01-07| |1001-01-01|1001-01-08| +--+--+ +--+--+ |dict |plain | +--+--+ |1001-01-07|1001-01-07| |1001-01-07|1001-01-08| |1001-01-07|1001-01-09| |1001-01-07|1001-01-10| |1001-01-07|1001-01-11| |1001-01-07|1001-01-12| |1001-01-07|1001-01-13| |1001-01-07|1001-01-14| +--+--+ {code} > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. > I've ma
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241400#comment-17241400 ] Maxim Gekk commented on SPARK-33571: Spark 3.0.1 shows different results as well: {code:scala} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_275) scala> spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false) 20/12/01 12:31:59 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) org.apache.spark.SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: reading dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z from Parquet files can be ambiguous, as the files may be written by Spark 2.x or legacy versions of Hive, which uses a legacy hybrid calendar that is different from Spark 3.0+'s Proleptic Gregorian calendar. See more details in SPARK-31404. You can set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'LEGACY' to rebase the datetime values w.r.t. the calendar difference during reading. Or set spark.sql.legacy.parquet.datetimeRebaseModeInRead to 'CORRECTED' to read the datetime values as it is. scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "LEGACY") scala> spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false) +--+--+ |dict |plain | +--+--+ |1001-01-01|1001-01-01| |1001-01-01|1001-01-02| |1001-01-01|1001-01-03| |1001-01-01|1001-01-04| |1001-01-01|1001-01-05| |1001-01-01|1001-01-06| |1001-01-01|1001-01-07| |1001-01-01|1001-01-08| +--+--+ scala> spark.conf.set("spark.sql.legacy.parquet.datetimeRebaseModeInRead", "CORRECTED") scala> spark.read.parquet("/Users/maximgekk/proj/parquet-read-2_4_5_files/sql/core/src/test/resources/test-data/before_1582_date_v2_4_5.snappy.parquet").show(false) +--+--+ |dict |plain | +--+--+ |1001-01-07|1001-01-07| |1001-01-07|1001-01-08| |1001-01-07|1001-01-09| |1001-01-07|1001-01-10| |1001-01-07|1001-01-11| |1001-01-07|1001-01-12| |1001-01-07|1001-01-13| |1001-01-07|1001-01-14| +--+--+ {code} > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog po
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17241408#comment-17241408 ] Maxim Gekk commented on SPARK-33571: [~simonvanderveldt] Looking at the dates, you tested, both dates 1880-10-01 and 2020-10-01 belong to the Gregorian calendar, so, should be no diffs. For the date 0220-10-01, please, have a look at the table which I built in the PR: https://github.com/apache/spark/pull/28067 . The table shows that there is no diffs between 2 calendars for the year. > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. > I've made some scripts to help with testing/show the behavior, it uses > pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here > [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the > outputs in a comment below as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table
Maxim Gekk created SPARK-33650: -- Summary: Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table Key: SPARK-33650 URL: https://issues.apache.org/jira/browse/SPARK-33650 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk For a V2 table that doesn't support partition management, ALTER TABLE .. ADD/DROP PARTITION throws misleading exception: {code:java} PartitionSpecs are not resolved;; 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false +- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859 org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;; 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false +- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859 at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49) {code} The error should say that the table doesn't support partition management. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243441#comment-17243441 ] Maxim Gekk commented on SPARK-33571: I opened the PR [https://github.com/apache/spark/pull/30596] with some improvements for config docs. [~hyukjin.kwon] [~cloud_fan] could you review it, please. > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. > I've made some scripts to help with testing/show the behavior, it uses > pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here > [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the > outputs in a comment below as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33667) Respect case sensitivity in V1 SHOW PARTITIONS
Maxim Gekk created SPARK-33667: -- Summary: Respect case sensitivity in V1 SHOW PARTITIONS Key: SPARK-33667 URL: https://issues.apache.org/jira/browse/SPARK-33667 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.8, 3.0.2, 3.1.0 Reporter: Maxim Gekk SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is true by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33667) Respect case sensitivity in V1 SHOW PARTITIONS
[ https://issues.apache.org/jira/browse/SPARK-33667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33667: --- Description: SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is false by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; {code} was: SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config *spark.sql.caseSensitive* which is true by default, for instance: {code:sql} spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > USING parquet > PARTITIONED BY (year, month); spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); Error in query: Non-partitioning column(s) [YEAR, Month] are specified for SHOW PARTITIONS; {code} > Respect case sensitivity in V1 SHOW PARTITIONS > -- > > Key: SPARK-33667 > URL: https://issues.apache.org/jira/browse/SPARK-33667 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > SHOW PARTITIONS is case sensitive, and doesn't respect the SQL config > *spark.sql.caseSensitive* which is false by default, for instance: > {code:sql} > spark-sql> CREATE TABLE tbl1 (price int, qty int, year int, month int) > > USING parquet > > PARTITIONED BY (year, month); > spark-sql> INSERT INTO tbl1 PARTITION(year = 2015, month = 1) SELECT 1, 1; > spark-sql> SHOW PARTITIONS tbl1 PARTITION(YEAR = 2015, Month = 1); > Error in query: Non-partitioning column(s) [YEAR, Month] are specified for > SHOW PARTITIONS; > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33670) Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED
Maxim Gekk created SPARK-33670: -- Summary: Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED Key: SPARK-33670 URL: https://issues.apache.org/jira/browse/SPARK-33670 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.2, 3.1.0 Reporter: Maxim Gekk Invoke the check verifyPartitionProviderIsHive() from v1 implementation of SHOW TABLE EXTENDED. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33671) Remove VIEW checks from V1 table commands
Maxim Gekk created SPARK-33671: -- Summary: Remove VIEW checks from V1 table commands Key: SPARK-33671 URL: https://issues.apache.org/jira/browse/SPARK-33671 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.2, 3.1.0 Reporter: Maxim Gekk Checking of VIEWs is performed earlier, see https://github.com/apache/spark/pull/30461 . So, the checks can be removed from some V1 commands. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33672) Check SQLContext.tables() for V2 session catalog
Maxim Gekk created SPARK-33672: -- Summary: Check SQLContext.tables() for V2 session catalog Key: SPARK-33672 URL: https://issues.apache.org/jira/browse/SPARK-33672 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk V1 ShowTablesCommand is hard coded in SQLContext: https://github.com/apache/spark/blob/a088a801ed8c17171545c196a3f26ce415de0cd1/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L671 The ticket aims to checks tables() behavior for V2 session catalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33672) Check SQLContext.tables() for V2 session catalog
[ https://issues.apache.org/jira/browse/SPARK-33672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244542#comment-17244542 ] Maxim Gekk commented on SPARK-33672: [~cloud_fan] FYI > Check SQLContext.tables() for V2 session catalog > > > Key: SPARK-33672 > URL: https://issues.apache.org/jira/browse/SPARK-33672 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > V1 ShowTablesCommand is hard coded in SQLContext: > https://github.com/apache/spark/blob/a088a801ed8c17171545c196a3f26ce415de0cd1/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L671 > The ticket aims to checks tables() behavior for V2 session catalog. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33676) Require exact matched partition spec to schema in ADD/DROP PARTITION
Maxim Gekk created SPARK-33676: -- Summary: Require exact matched partition spec to schema in ADD/DROP PARTITION Key: SPARK-33676 URL: https://issues.apache.org/jira/browse/SPARK-33676 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk The V1 implementation of ALTER TABLE .. ADD/DROP PARTITION fails when the partition spec doesn't exactly match to the partition schema: {code:sql} ALTER TABLE tab1 ADD PARTITION (A='9') Partition spec is invalid. The spec (a) must match the partition spec (a, b) defined in table '`dbx`.`tab1`'; org.apache.spark.sql.AnalysisException: Partition spec is invalid. The spec (a) must match the partition spec (a, b) defined in table '`dbx`.`tab1`'; at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$requireExactMatchedPartitionSpec$1(SessionCatalog.scala:1173) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.$anonfun$requireExactMatchedPartitionSpec$1$adapted(SessionCatalog.scala:1171) at scala.collection.immutable.List.foreach(List.scala:392) {code} for a table partitioned by "a", "b" but the V2 implementation add the wrong partition silently. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33688) Migrate SHOW TABLE EXTENDED to new resolution framework
Maxim Gekk created SPARK-33688: -- Summary: Migrate SHOW TABLE EXTENDED to new resolution framework Key: SPARK-33688 URL: https://issues.apache.org/jira/browse/SPARK-33688 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk # Create the Command logical node for SHOW TABLE EXTENDED # Remove ShowTableStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33688) Migrate SHOW TABLE EXTENDED to new resolution framework
[ https://issues.apache.org/jira/browse/SPARK-33688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245143#comment-17245143 ] Maxim Gekk commented on SPARK-33688: I am working on this. > Migrate SHOW TABLE EXTENDED to new resolution framework > --- > > Key: SPARK-33688 > URL: https://issues.apache.org/jira/browse/SPARK-33688 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > # Create the Command logical node for SHOW TABLE EXTENDED > # Remove ShowTableStatement -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33670) Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED
[ https://issues.apache.org/jira/browse/SPARK-33670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245174#comment-17245174 ] Maxim Gekk commented on SPARK-33670: [~hyukjin.kwon] Just in case, which "Affects Version" should be pointed out - already released or current unreleased version. For example, I pointed out 3.0.2 but it has been not released yet. Maybe, I should set 3.0.1? > Verify the partition provider is Hive in v1 SHOW TABLE EXTENDED > --- > > Key: SPARK-33670 > URL: https://issues.apache.org/jira/browse/SPARK-33670 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.2, 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0 > > > Invoke the check verifyPartitionProviderIsHive() from v1 implementation of > SHOW TABLE EXTENDED. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33441) Add "unused-import" compile arg to scalac and remove all unused imports in Scala code
[ https://issues.apache.org/jira/browse/SPARK-33441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245723#comment-17245723 ] Maxim Gekk commented on SPARK-33441: Would it be possible to check unused imports in Java code? > Add "unused-import" compile arg to scalac and remove all unused imports in > Scala code > -- > > Key: SPARK-33441 > URL: https://issues.apache.org/jira/browse/SPARK-33441 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.1.0 > > > * Add new scala compile arg to defense against new unused imports: > ** "-Ywarn-unused-import" for Scala 2.12 > ** "-Wconf:cat=unused-imports:ws" or "-Wconf:cat=unused-imports:error" for > Scala 2.13 > * Remove all unused imports in Scala code -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33706) Require fully specified partition identifier in partitionExists()
Maxim Gekk created SPARK-33706: -- Summary: Require fully specified partition identifier in partitionExists() Key: SPARK-33706 URL: https://issues.apache.org/jira/browse/SPARK-33706 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk Currently, partitionExists() from SupportsPartitionManagement accept any partition identifier even which is not fully specified. This ticket aim to add a check for the length of partition schema and partition identifier, and require exact matching. So, we should prohibit not fully specified IDs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33558: --- Summary: Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests (was: Unify v1 and v2 ALTER TABLE .. PARTITION tests) > Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests > -- > > Key: SPARK-33558 > URL: https://issues.apache.org/jira/browse/SPARK-33558 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33558) Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33558: --- Description: Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. ADD PARTITION tests > -- > > Key: SPARK-33558 > URL: https://issues.apache.org/jira/browse/SPARK-33558 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246717#comment-17246717 ] Maxim Gekk commented on SPARK-33571: > The behavior of the to be introduced in Spark 3.1 > `spark.sql.legacy.parquet.int96RebaseModeIn*` is the same as for > `datetimeRebaseModeIn*`? Yes. > So Spark will check the parquet metadata for Spark version and the > `datetimeRebaseModeInRead` metadata key and use the correct behavior. Correct, except of names of metadata keys. Spark checks , see https://github.com/MaxGekk/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/package.scala#L58-L68 > If those are not set it will raise an exception and ask the user to define > the mode. Is that correct? Yes. Spark should raise the exception if it is not clear which calendar the writer used. > but from my testing Spark 3 does the same by default, not sure if that aligns > with your findings? Spark 3.0.0-SNAPSHOT saved timestamps as TIMESTAMP_MICROS in parquet till https://github.com/apache/spark/pull/28450 . I just wanted to say that the configs datetimeRebaseModeIn* you pointed out don't impact on INT96 in Spark 3.0. > What is the expected behavior for TIMESTAMP_MICROS and TIMESTAMP_MILLIS with > regards to this? The same as for DATE type. Spark takes into account the same SQL configs and metdata keys from parquet files. > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > Fix For: 3.1.0 > > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compared to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. > I've made some scripts to help with testing/show the behavior, it uses > pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here > [https://github.com/simonvanderveldt/spark3-reba
[jira] [Created] (SPARK-33742) Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions()
Maxim Gekk created SPARK-33742: -- Summary: Throw PartitionsAlreadyExistException from HiveExternalCatalog.createPartitions() Key: SPARK-33742 URL: https://issues.apache.org/jira/browse/SPARK-33742 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.1, 2.4.7, 3.1.0 Reporter: Maxim Gekk HiveExternalCatalog.createPartitions throws AlreadyExistsException wrapped by AnalysisException. The behavior deviates from V1/V2 in-memory catalogs that throw PartitionsAlreadyExistException. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
Maxim Gekk created SPARK-33767: -- Summary: Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests Key: SPARK-33767 URL: https://issues.apache.org/jira/browse/SPARK-33767 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.1.0 Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33767) Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests
[ https://issues.apache.org/jira/browse/SPARK-33767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33767: --- Description: Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites. (was: Extract ALTER TABLE .. ADD PARTITION tests to the common place to run them for V1 and v2 datasources. Some tests can be places to V1 and V2 specific test suites.) > Unify v1 and v2 ALTER TABLE .. DROP PARTITION tests > --- > > Key: SPARK-33767 > URL: https://issues.apache.org/jira/browse/SPARK-33767 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0 > > > Extract ALTER TABLE .. DROP PARTITION tests to the common place to run them > for V1 and v2 datasources. Some tests can be places to V1 and V2 specific > test suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33768) Remove unused parameter `retainData` from AlterTableDropPartition
Maxim Gekk created SPARK-33768: -- Summary: Remove unused parameter `retainData` from AlterTableDropPartition Key: SPARK-33768 URL: https://issues.apache.org/jira/browse/SPARK-33768 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The parameter is hard-coded to false while parsing in AstBuilder. The parameter can be removed from the logical node. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33770) Test failures: ALTER TABLE .. DROP PARTITION tries to delete files out of partition path
Maxim Gekk created SPARK-33770: -- Summary: Test failures: ALTER TABLE .. DROP PARTITION tries to delete files out of partition path Key: SPARK-33770 URL: https://issues.apache.org/jira/browse/SPARK-33770 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk For example: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132719/testReport/org.apache.spark.sql.hive.execution.command/AlterTableAddPartitionSuite/ALTER_TABLEADD_PARTITION_Hive_V1__SPARK_33521__universal_type_conversions_of_partition_values/ {code:java} org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of partition values sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: File file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-38fe2706-33e5-469a-ba3a-682391e02179 does not exist; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112) at org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014) at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.dropPartitions(ExternalCatalogWithListener.scala:211) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.dropPartitions(SessionCatalog.scala:1036) at org.apache.spark.sql.execution.command.AlterTableDropPartitionCommand.run(ddl.scala:582) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33777) Sort output of V2 SHOW PARTITIONS
[ https://issues.apache.org/jira/browse/SPARK-33777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-33777: --- Summary: Sort output of V2 SHOW PARTITIONS (was: Sort output of SHOW PARTITIONS V2) > Sort output of V2 SHOW PARTITIONS > - > > Key: SPARK-33777 > URL: https://issues.apache.org/jira/browse/SPARK-33777 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > V1 SHOW PARTITIONS command sorts its results. Both V1 implementations > in-memory and Hive catalog (according to Hive docs > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowPartitions)] > perform sorting. V2 should have the same behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33777) Sort output of SHOW PARTITIONS V2
Maxim Gekk created SPARK-33777: -- Summary: Sort output of SHOW PARTITIONS V2 Key: SPARK-33777 URL: https://issues.apache.org/jira/browse/SPARK-33777 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk V1 SHOW PARTITIONS command sorts its results. Both V1 implementations in-memory and Hive catalog (according to Hive docs [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowPartitions)] perform sorting. V2 should have the same behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org