[jira] [Reopened] (SPARK-22439) Not able to get numeric columns for the file having decimal values
[ https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa reopened SPARK-22439: -- > Not able to get numeric columns for the file having decimal values > -- > > Key: SPARK-22439 > URL: https://issues.apache.org/jira/browse/SPARK-22439 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 2.2.0 >Reporter: Navya Krishnappa > > When reading the below-mentioned decimal value by specifying header as true. > SourceFile: > 8.95977565356765764E+20 > 8.95977565356765764E+20 > 8.95977565356765764E+20 > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(HEADER, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, " > ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.numericColumns() > Result: > Caused by: java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22439) Not able to get numeric columns for the file having decimal values
[ https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249341#comment-16249341 ] Navya Krishnappa commented on SPARK-22439: -- [~sowen] Thank you for your response. According to the issue, If we just add the header to the above-given data it is working fine. I don't understand only with header change why it is not working. Let me know if you need more inputs. SourceFile: Column1 8.95977565356765764E+20 8.95977565356765764E+20 8.95977565356765764E+20 Source code1: Dataset dataset = getSqlContext().read() .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(HEADER, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, " ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.numericColumns() Columns1 - decimal(18,-3) > Not able to get numeric columns for the file having decimal values > -- > > Key: SPARK-22439 > URL: https://issues.apache.org/jira/browse/SPARK-22439 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 2.2.0 >Reporter: Navya Krishnappa > > When reading the below-mentioned decimal value by specifying header as true. > SourceFile: > 8.95977565356765764E+20 > 8.95977565356765764E+20 > 8.95977565356765764E+20 > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(HEADER, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, " > ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.numericColumns() > Result: > Caused by: java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20387) Permissive mode is not replacing corrupt record with null
[ https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16249301#comment-16249301 ] Navya Krishnappa edited comment on SPARK-20387 at 11/13/17 9:44 AM: Not all the corrupted values are replacing with null. Refer the below-given scenario: Source File: 'Col1','Col2','Col3','Col4','Col5','Col6', '1000','abc','10yui000','400','20.8','2003-03-04', '1001','xyz','3','4000','20.8','2003-03-04', '1002','abc','4','40,000','20.8','2003-03-04' '1003','xyz','5','40,','20.8','2003-03-04' '1004','abc','6','40,000','20.8','2003-03-04' User_defined_Schema: [{ "dataType": "integer", "type": "Measure", "name": "Col1" }, { "dataType": "string", "type": "Dimension", "name": "Col2" }, { "dataType": "float", "type": "Measure", "name": "Col3" }, { "dataType": "string", "type": "Dimension", "name": "Col4" }, { "dataType": "double", "type": "Measure", "name": "Col5" }, { "dataType": "date", "type": "Dimension", "name": "Col6" }, { "dataType": "string", "type": "Dimension", "name": "_c6" } Source code1: Dataset dataset =sparkSession.read().schema(User_defined_Schema) .option(PARSER_LIB, "commons") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.collect(); Result: 10yui000 is parsed as 10 Row : '1000','abc','10','400','20.8','2003-03-04', Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with null. was (Author: navya krishnappa): Source File: 'Col1','Col2','Col3','Col4','Col5','Col6', '1000','abc','10yui000','400','20.8','2003-03-04', '1001','xyz','3','4000','20.8','2003-03-04', '1002','abc','4','40,000','20.8','2003-03-04' '1003','xyz','5','40,','20.8','2003-03-04' '1004','abc','6','40,000','20.8','2003-03-04' User_defined_Schema: [{ "dataType": "integer", "type": "Measure", "name": "Col1" }, { "dataType": "string", "type": "Dimension", "name": "Col2" }, { "dataType": "float", "type": "Measure", "name": "Col3" }, { "dataType": "string", "type": "Dimension", "name": "Col4" }, { "dataType": "double", "type": "Measure", "name": "Col5" }, { "dataType": "date", "type": "Dimension", "name": "Col6" }, { "dataType": "string", "type": "Dimension", "name": "_c6" } Source code1: Dataset dataset =sparkSession.read().schema(User_defined_Schema) .option(PARSER_LIB, "commons") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.collect(); Result: 10yui000 is parsed as 10 Row : '1000','abc','10','400','20.8','2003-03-04', Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with null. > Permissive mode is not replacing corrupt record with null > - > > Key: SPARK-20387 > URL: https://issues.apache.org/jira/browse/SPARK-20387 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying "mode" as > PERMISSIVE. > Source File: > String,int,f1,bool1 > abc,23111,23.07738,true > abc,23111,23.07738,true > abc,23111,true,true > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.co
[jira] [Reopened] (SPARK-20387) Permissive mode is not replacing corrupt record with null
[ https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa reopened SPARK-20387: -- Source File: 'Col1','Col2','Col3','Col4','Col5','Col6', '1000','abc','10yui000','400','20.8','2003-03-04', '1001','xyz','3','4000','20.8','2003-03-04', '1002','abc','4','40,000','20.8','2003-03-04' '1003','xyz','5','40,','20.8','2003-03-04' '1004','abc','6','40,000','20.8','2003-03-04' User_defined_Schema: [{ "dataType": "integer", "type": "Measure", "name": "Col1" }, { "dataType": "string", "type": "Dimension", "name": "Col2" }, { "dataType": "float", "type": "Measure", "name": "Col3" }, { "dataType": "string", "type": "Dimension", "name": "Col4" }, { "dataType": "double", "type": "Measure", "name": "Col5" }, { "dataType": "date", "type": "Dimension", "name": "Col6" }, { "dataType": "string", "type": "Dimension", "name": "_c6" } Source code1: Dataset dataset =sparkSession.read().schema(User_defined_Schema) .option(PARSER_LIB, "commons") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.collect(); Result: 10yui000 is parsed as 10 Row : '1000','abc','10','400','20.8','2003-03-04', Expected: According to the PERMISSIVE mode, 10yui000 should be replaced with null. > Permissive mode is not replacing corrupt record with null > - > > Key: SPARK-20387 > URL: https://issues.apache.org/jira/browse/SPARK-20387 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying "mode" as > PERMISSIVE. > Source File: > String,int,f1,bool1 > abc,23111,23.07738,true > abc,23111,23.07738,true > abc,23111,true,true > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.collect(); > Result: Error is thrown > stack trace: > ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15) > java.lang.IllegalArgumentException: For input string: "23.07738" > at > scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290) > at > scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260) > at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22439) Not able to get numeric columns for the file having decimal values
[ https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-22439: - Summary: Not able to get numeric columns for the file having decimal values (was: Not able to get numeric columns for the attached file) > Not able to get numeric columns for the file having decimal values > -- > > Key: SPARK-22439 > URL: https://issues.apache.org/jira/browse/SPARK-22439 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 2.2.0 >Reporter: Navya Krishnappa >Priority: Major > > When reading the below-mentioned decimal value by specifying header as true. > SourceFile: > 8.95977565356765764E+20 > 8.95977565356765764E+20 > 8.95977565356765764E+20 > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(HEADER, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, " > ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.numericColumns() > Result: > Caused by: java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22439) Not able to get numeric columns for the attached file
[ https://issues.apache.org/jira/browse/SPARK-22439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-22439: - Summary: Not able to get numeric columns for the attached file (was: Not able to get numeric column for the attached file) > Not able to get numeric columns for the attached file > - > > Key: SPARK-22439 > URL: https://issues.apache.org/jira/browse/SPARK-22439 > Project: Spark > Issue Type: Bug > Components: Java API, SQL >Affects Versions: 2.2.0 >Reporter: Navya Krishnappa >Priority: Major > > When reading the below-mentioned decimal value by specifying header as true. > SourceFile: > 8.95977565356765764E+20 > 8.95977565356765764E+20 > 8.95977565356765764E+20 > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(HEADER, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, " > ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.numericColumns() > Result: > Caused by: java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223) > at > org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22439) Not able to get numeric column for the attached file
Navya Krishnappa created SPARK-22439: Summary: Not able to get numeric column for the attached file Key: SPARK-22439 URL: https://issues.apache.org/jira/browse/SPARK-22439 Project: Spark Issue Type: Bug Components: Java API, SQL Affects Versions: 2.2.0 Reporter: Navya Krishnappa Priority: Major When reading the below-mentioned decimal value by specifying header as true. SourceFile: 8.95977565356765764E+20 8.95977565356765764E+20 8.95977565356765764E+20 Source code1: Dataset dataset = getSqlContext().read() .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(HEADER, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, " ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.numericColumns() Result: Caused by: java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:347) at scala.None$.get(Option.scala:345) at org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:223) at org.apache.spark.sql.Dataset$$anonfun$numericColumns$2.apply(Dataset.scala:222) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-22020) Support session local timezone
[ https://issues.apache.org/jira/browse/SPARK-22020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa reopened SPARK-22020: -- This is not working as expected. Please refer the above-mentioned description > Support session local timezone > -- > > Key: SPARK-22020 > URL: https://issues.apache.org/jira/browse/SPARK-22020 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Navya Krishnappa > > As of Spark 2.1, Spark SQL assumes the machine timezone for datetime > manipulation, which is bad if users are not in the same timezones as the > machines, or if different users have different timezones. > Input data: > Date,SparkDate,SparkDate1,SparkDate2 > 04/22/2017T03:30:02,2017-03-21T03:30:02,2017-03-21T03:30:02.02Z,2017-03-21T00:00:00Z > I have set the below value to set the timeZone to UTC. It is adding the > current timeZone value even though it is in the UTC format. > spark.conf.set("spark.sql.session.timeZone", "UTC") > Expected : Time should remain same as the input since it's already in UTC > format > var df1 = spark.read.option("delimiter", ",").option("qualifier", > "\"").option("inferSchema","true").option("header", "true").option("mode", > "PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat", > "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv"); > df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more > fields] > scala> df1.show(false); > -- > Name Age Add DateSparkDate SparkDate1 SparkDate2 > -- > abc 21 bvxc04/22/2017T03:30:02 2017-03-21 03:30:02 > 2017-03-21 09:00:02.02 2017-03-21 05:30:00 > -- -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22020) Support session local timezone
Navya Krishnappa created SPARK-22020: Summary: Support session local timezone Key: SPARK-22020 URL: https://issues.apache.org/jira/browse/SPARK-22020 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Navya Krishnappa As of Spark 2.1, Spark SQL assumes the machine timezone for datetime manipulation, which is bad if users are not in the same timezones as the machines, or if different users have different timezones. Input data: Date,SparkDate,SparkDate1,SparkDate2 04/22/2017T03:30:02,2017-03-21T03:30:02,2017-03-21T03:30:02.02Z,2017-03-21T00:00:00Z I have set the below value to set the timeZone to UTC. It is adding the current timeZone value even though it is in the UTC format. spark.conf.set("spark.sql.session.timeZone", "UTC") Expected : Time should remain same as the input since it's already in UTC format var df1 = spark.read.option("delimiter", ",").option("qualifier", "\"").option("inferSchema","true").option("header", "true").option("mode", "PERMISSIVE").option("timestampFormat","MM/dd/'T'HH:mm:ss.SSS").option("dateFormat", "MM/dd/'T'HH:mm:ss").csv("DateSpark.csv"); df1: org.apache.spark.sql.DataFrame = [Name: string, Age: int ... 5 more fields] scala> df1.show(false); -- NameAge Add DateSparkDate SparkDate1 SparkDate2 -- abc 21 bvxc04/22/2017T03:30:02 2017-03-21 03:30:02 2017-03-21 09:00:02.02 2017-03-21 05:30:00 -- -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240 ] Navya Krishnappa edited comment on SPARK-18877 at 7/4/17 5:42 AM: -- [~dongjoon] I have created parquet bug for the invalid scale issue in Decimal data type. But Parquet team is telling its Spark issue. Please refer https://issues.apache.org/jira/browse/PARQUET-815 and add your comments. was (Author: navya krishnappa): [~dongjoon] I have created parquet bug for the invalid scale issue in Decimal data type. But Parquet team is telling its a Spark issue. Please refer https://issues.apache.org/jira/browse/PARQUET-815 and add your comments. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa >Assignee: Dongjoon Hyun > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240 ] Navya Krishnappa edited comment on SPARK-18877 at 7/3/17 10:33 AM: --- [~dongjoon] I have created parquet bug for the invalid scale issue in Decimal data type. But Parquet team is telling its a Spark issue. Please refer https://issues.apache.org/jira/browse/PARQUET-815 and add your comments. was (Author: navya krishnappa): [~dongjoon] I have created parquet bug the invalid scale issue for Decimal data type. But Parquet team is telling its a Spark issue. Please refer https://issues.apache.org/jira/browse/PARQUET-815 and add your comments. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa >Assignee: Dongjoon Hyun > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072240#comment-16072240 ] Navya Krishnappa commented on SPARK-18877: -- [~dongjoon] I have created parquet bug the invalid scale issue for Decimal data type. But Parquet team is telling its a Spark issue. Please refer https://issues.apache.org/jira/browse/PARQUET-815 and add your comments. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa >Assignee: Dongjoon Hyun > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072234#comment-16072234 ] Navya Krishnappa commented on SPARK-21263: -- [~sowen] & [~hyukjin.kwon] Thanks for your comments. Let me know the resolution for this issue. > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. Refer the details : > *Data:* > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > *Source code*: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > *Schema1*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > *Schema2*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-21263: - Description: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Refer the details : *Data:* 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' *Source code*: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); *Schema1*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" *Schema2*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" was: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Refer the *Data:* 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' *Source code*: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); *Schema1*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" *Schema2*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. Refer the details : > *Data:* > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > *Source code*: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > *Schema1*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > *Schema2*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-21263: - Description: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Refer the *Data:* 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' *Source code*: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); *Schema1*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" *Schema2*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" was: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. *Data:* 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' *Source code*: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); *Schema1*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" *Schema2*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. Refer the > *Data:* > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > *Source code*: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > *Schema1*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > *Schema2*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-21263: - Description: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. *Data:* 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' *Source code*: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); *Schema1*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" *Schema2*: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" was: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' Source code: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. > *Data:* > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > *Source code*: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > *Schema1*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > *Schema2*: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-21263: - Description: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' Source code: Dataset dataset = sparkSession.read().schema(schema) .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); When we collect the dataset data: dataset.collectAsList(); Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] *Result *: Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] *Actual Result*: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, *Expected Result*: Should throw NumberFormatException for input string "10u000" was: When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] Result : Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] Actual Result: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, Expected Result: Should throw NumberFormatException for input string "10u000" > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. > Data > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > Source code: > Dataset dataset = sparkSession.read().schema(schema) > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > When we collect the dataset data: > dataset.collectAsList(); > Schema1: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > *Result *: Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > Schema2: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > *Actual Result*: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > *Expected Result*: Should throw NumberFormatException for input string > "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21263) Exception is not thrown while converting an invalid string to float/double
Navya Krishnappa created SPARK-21263: Summary: Exception is not thrown while converting an invalid string to float/double Key: SPARK-21263 URL: https://issues.apache.org/jira/browse/SPARK-21263 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.1.1 Reporter: Navya Krishnappa When reading a below-mentioned data by specifying user-defined schema, exception is not thrown. Data 'PatientID','PatientName','TotalBill' '1000','Patient1','10u000' '1001','Patient2','3' '1002','Patient3','4' '1003','Patient4','5' '1004','Patient5','6' Schema1: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,IntegerType,true)] Result : Throws NumerFormatException Caused by: java.lang.NumberFormatException: For input string: "10u000" Schema2: [StructField(PatientID,IntegerType,true), StructField(PatientName,StringType,true), StructField(TotalBill,DoubleType,true)] Actual Result: "PatientID": 1000, "NumberOfVisits": "400", "TotalBill": 10, Expected Result: Should throw NumberFormatException for input string "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21263) NumberFormatException is not thrown while converting an invalid string to float/double
[ https://issues.apache.org/jira/browse/SPARK-21263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-21263: - Summary: NumberFormatException is not thrown while converting an invalid string to float/double (was: Exception is not thrown while converting an invalid string to float/double) > NumberFormatException is not thrown while converting an invalid string to > float/double > -- > > Key: SPARK-21263 > URL: https://issues.apache.org/jira/browse/SPARK-21263 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.1 >Reporter: Navya Krishnappa > > When reading a below-mentioned data by specifying user-defined schema, > exception is not thrown. > Data > 'PatientID','PatientName','TotalBill' > '1000','Patient1','10u000' > '1001','Patient2','3' > '1002','Patient3','4' > '1003','Patient4','5' > '1004','Patient5','6' > Schema1: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,IntegerType,true)] > Result : Throws NumerFormatException > Caused by: java.lang.NumberFormatException: For input string: "10u000" > Schema2: > [StructField(PatientID,IntegerType,true), > StructField(PatientName,StringType,true), > StructField(TotalBill,DoubleType,true)] > Actual Result: > "PatientID": 1000, > "NumberOfVisits": "400", > "TotalBill": 10, > Expected Result: Should throw NumberFormatException for input string "10u000" -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554 ] Navya Krishnappa edited comment on SPARK-18877 at 5/23/17 4:21 AM: --- Thank you for replying [~dongjoon]. Can you help me in understanding whether the above mentioned PR will resolve the below mentioned issue. I have another issue with respect to the decimal scale. When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(HEADER, "true") .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, " ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile) // create an parquet file dataset.write().parquet("//path.parquet") Stack trace: Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) at org.apache.parquet.schema.Types$Builder.named(Types.java:228) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) was (Author: navya krishnappa): Thank you for replying [~dongjoon]. Can you help me in understanding whether the above mentioned PR will resolve the below mentioned issue. I have another issue with respect to the decimal scale. When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(HEADER, "true") .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, " ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile) // create an parquet file dataset.write().parquet("//path.parquet") Stack trace: Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) at org.apache.parquet.schema.Types$Builder.named(Types.java:228) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at org.apache.spark.sql.types.StructType.map(StructType.scala:95) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasou
[jira] [Updated] (SPARK-20387) Permissive mode is not replacing corrupt record with null
[ https://issues.apache.org/jira/browse/SPARK-20387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20387: - Description: When reading the below mentioned time value by specifying "mode" as PERMISSIVE. Source File: String,int,f1,bool1 abc,23111,23.07738,true abc,23111,23.07738,true abc,23111,true,true Source code1: Dataset dataset = getSqlContext().read() .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); dataset.collect(); Result: Error is thrown stack trace: ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15) java.lang.IllegalArgumentException: For input string: "23.07738" at scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290) at scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260) at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270) at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125) at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) > Permissive mode is not replacing corrupt record with null > - > > Key: SPARK-20387 > URL: https://issues.apache.org/jira/browse/SPARK-20387 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying "mode" as > PERMISSIVE. > Source File: > String,int,f1,bool1 > abc,23111,23.07738,true > abc,23111,23.07738,true > abc,23111,true,true > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > dataset.collect(); > Result: Error is thrown > stack trace: > ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 15) > java.lang.IllegalArgumentException: For input string: "23.07738" > at > scala.collection.immutable.StringLike$class.parseBoolean(StringLike.scala:290) > at > scala.collection.immutable.StringLike$class.toBoolean(StringLike.scala:260) > at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:29) > at > org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:270) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:125) > at > org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:94) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:167) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:166) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20387) Permissive mode is not replacing corrupt record with null
Navya Krishnappa created SPARK-20387: Summary: Permissive mode is not replacing corrupt record with null Key: SPARK-20387 URL: https://issues.apache.org/jira/browse/SPARK-20387 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.1.0 Reporter: Navya Krishnappa -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18936) Infrastructure for session local timezone support
[ https://issues.apache.org/jira/browse/SPARK-18936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950748#comment-15950748 ] Navya Krishnappa edited comment on SPARK-18936 at 3/31/17 11:51 AM: I think this fix helps us to set the time zone in the spark configurations. If it's so Can we set "UTC" as my time zone?? And let me know if I misunderstood the document. was (Author: navya krishnappa): I think this fix helps us to set the time zone in the spark configurations. If it's so Can we set "UTC" time zone?? And let me know if I misunderstood the document. > Infrastructure for session local timezone support > - > > Key: SPARK-18936 > URL: https://issues.apache.org/jira/browse/SPARK-18936 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Takuya Ueshin > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18936) Infrastructure for session local timezone support
[ https://issues.apache.org/jira/browse/SPARK-18936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950748#comment-15950748 ] Navya Krishnappa commented on SPARK-18936: -- I think this fix helps us to set the time zone in the spark configurations. If it's so Can we set "UTC" time zone?? And let me know if I misunderstood the document. > Infrastructure for session local timezone support > - > > Key: SPARK-18936 > URL: https://issues.apache.org/jira/browse/SPARK-18936 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Takuya Ueshin > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950745#comment-15950745 ] Navya Krishnappa commented on SPARK-20152: -- [~srowen] & [~hyukjin.kwon] Thank you for your comments. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949399#comment-15949399 ] Navya Krishnappa edited comment on SPARK-20152 at 3/30/17 4:53 PM: --- But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse "2017-03-21T00:00:00Z", it is working fine. The Same scenario is not applied while parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format. Let me know if my inputs are wrong. was (Author: navya krishnappa): But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse "2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format. Let me know if my inputs are wrong. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949399#comment-15949399 ] Navya Krishnappa commented on SPARK-20152: -- But if we specify timestampformat: "-MM-dd'T'HH:mm:ss.SSSZZ" and parse "2017-03-21T00:00:00Z", it is working fine. Same scenario is not applied while parsing "03-21-2017T03:30:02Z" with "MM-dd-'T'HH:mm:ss.SSSZZ" format. Let me know if my inputs are wrong. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948989#comment-15948989 ] Navya Krishnappa edited comment on SPARK-20152 at 3/30/17 12:48 PM: According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp format. In above-mentioned example, i have swapped the date fields. And I'm using valid letters in my format. was (Author: navya krishnappa): According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp format. In examples, i have swapped the date fields. And I'm using valid letters in my format. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15948989#comment-15948989 ] Navya Krishnappa commented on SPARK-20152: -- According to the spark "-MM-dd'T'HH:mm:ss.SSSZZ" is default timestamp format. In examples, i have swapped the date fields. And I'm using valid letters in my format. > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554 ] Navya Krishnappa edited comment on SPARK-18877 at 3/30/17 12:45 PM: Thank you for replying [~dongjoon]. Can you help me in understanding whether the above mentioned PR will resolve the below mentioned issue. I have another issue with respect to the decimal scale. When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(HEADER, "true") .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, " ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile) // create an parquet file dataset.write().parquet("//path.parquet") Stack trace: Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) at org.apache.parquet.schema.Types$Builder.named(Types.java:228) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at org.apache.spark.sql.types.StructType.map(StructType.scala:95) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) was (Author: navya krishnappa): Thank you for replying [~dongjoon]. Can you help me in understanding whether the above mentioned PR will resolve the below mentioned issue. I have another issue with respect to the decimal scale. When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(DAWBConstant.HEADER, "true") .option(DAWBConstant.PARSER_LIB, "commons") .
[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20152: - Description: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation was: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DELIMITER, ",") > .option(QUOTE, "\"") > .option(ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754865#comment-15754865 ] Navya Krishnappa edited comment on SPARK-18877 at 3/30/17 12:44 PM: I'm using SparkContext.read() to read the content. Refer the given code using to read the csv file. Dataset dataset = getSqlContext().read() .option(HEADER, "true") .option(PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DELIMITER, ",") .option(QUOTE, "\"") .option(ESCAPE, "\\") .option(MODE, Mode.PERMISSIVE) .csv(sourceFile); if we collect the dataset (dataset.collect()). i will get java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 exception. was (Author: navya krishnappa): I'm using SparkContext.read() to read the content. Refer the given code using to read the csv file. Dataset dataset = getSqlContext().read() .option(DAWBConstant.HEADER, "true") .option(DAWBConstant.PARSER_LIB, "commons") .option(DAWBConstant.INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); if we collect the dataset (dataset.collect()). i will get java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 exception. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa >Assignee: Dongjoon Hyun > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20152: - Description: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation was: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-03-21 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20152: - Description: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation was: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20152: - Description: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation was: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
[ https://issues.apache.org/jira/browse/SPARK-20152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-20152: - Description: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Source File: TimeColumn 03-21-2017T03:30:02Z Source code1: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but expected result is TimeCoumn should be of "TimestampType" and should consider time zone for manipulation Source code2: Dataset dataset = getSqlContext().read() .option(DAWBConstant.PARSER_LIB, "commons") .option(INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but expected result is TimeCoumn should consider time zone for manipulation was: When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Sample data: TimeColumn 03-21-2017T03:30:02Z Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z" Expected Result: TimeCoumn should be of "TimestampType" > Time zone is not respected while parsing csv for timeStampFormat > "MM-dd-'T'HH:mm:ss.SSSZZ" > -- > > Key: SPARK-20152 > URL: https://issues.apache.org/jira/browse/SPARK-20152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Navya Krishnappa > > When reading the below mentioned time value by specifying the > "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. > Source File: > TimeColumn > 03-21-2017T03:30:02Z > Source code1: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss.SSSZZ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z", but > expected result is TimeCoumn should be of "TimestampType" and should > consider time zone for manipulation > Source code2: > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.PARSER_LIB, "commons") > .option(INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, "\\") > .option("timestampFormat" , "MM-dd-'T'HH:mm:ss") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile); > Result: TimeColumn [ TimestampType] and value is "2017-04-22 03:30:02.0", but > expected result is TimeCoumn should consider time zone for manipulation -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20152) Time zone is not respected while parsing csv for timeStampFormat "MM-dd-yyyy'T'HH:mm:ss.SSSZZ"
Navya Krishnappa created SPARK-20152: Summary: Time zone is not respected while parsing csv for timeStampFormat "MM-dd-'T'HH:mm:ss.SSSZZ" Key: SPARK-20152 URL: https://issues.apache.org/jira/browse/SPARK-20152 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.1.0 Reporter: Navya Krishnappa When reading the below mentioned time value by specifying the "timestampFormat": "MM-dd-'T'HH:mm:ss.SSSZZ", time zone is ignored. Sample data: TimeColumn 03-21-2017T03:30:02Z Result: TimeColumn [ StringType] and value is "03-21-2017T03:30:02Z" Expected Result: TimeCoumn should be of "TimestampType" -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867357#comment-15867357 ] Navya Krishnappa edited comment on SPARK-19442 at 2/15/17 7:04 AM: --- Thank you [~hyukjin.kwon]. It is working as per my requirement. I could create a new column with blank values. :) was (Author: navya krishnappa): Thank you [~hyukjin.kwon]. It is satisfied my requirement. I could create a new column with blank values. :) > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867357#comment-15867357 ] Navya Krishnappa commented on SPARK-19442: -- Thank you [~hyukjin.kwon]. It is satisfied my requirement. I could create a new column with blank values. :) > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865135#comment-15865135 ] Navya Krishnappa commented on SPARK-19442: -- If the source file has 3 columns NameAge Address Abc10 Bangalore Xyz10 Bangalore After adding new column say "State". Resultant dataset should be NameAge AddressState Abc10 Bangalore Xyz10 Bangalore > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863216#comment-15863216 ] Navya Krishnappa edited comment on SPARK-19442 at 2/13/17 5:29 AM: --- Thank you for your response. I could able to derive a new column from the existing column. But my intention is to add a new column. was (Author: navya krishnappa): Thank you for your response. It is working as expected. I could able to add a new column to the data set. > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa reopened SPARK-19442: -- > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
[ https://issues.apache.org/jira/browse/SPARK-19442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863216#comment-15863216 ] Navya Krishnappa commented on SPARK-19442: -- Thank you for your response. It is working as expected. I could able to add a new column to the data set. > Unable to add column to the dataset using Dataset.WithColumn() api > -- > > Key: SPARK-19442 > URL: https://issues.apache.org/jira/browse/SPARK-19442 > Project: Spark > Issue Type: Bug > Components: Java API >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When I'm creating a new column using Dataset.WithColumn() api, Analysis > Exception is thrown. > Dataset.WithColumn() api: > Dataset.withColumn("newColumnName', new > org.apache.spark.sql.Column("newColumnName").cast("int")); > Stacktrace: > cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19442) Unable to add column to the dataset using Dataset.WithColumn() api
Navya Krishnappa created SPARK-19442: Summary: Unable to add column to the dataset using Dataset.WithColumn() api Key: SPARK-19442 URL: https://issues.apache.org/jira/browse/SPARK-19442 Project: Spark Issue Type: Bug Components: Java API Affects Versions: 2.0.2 Reporter: Navya Krishnappa When I'm creating a new column using Dataset.WithColumn() api, Analysis Exception is thrown. Dataset.WithColumn() api: Dataset.withColumn("newColumnName', new org.apache.spark.sql.Column("newColumnName").cast("int")); Stacktrace: cannot resolve '`NewColumn`' given input columns: [abc,xyz ] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18962) Unable to create parquet file for the given data
[ https://issues.apache.org/jira/browse/SPARK-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-18962: - Affects Version/s: 2.0.2 > Unable to create parquet file for the given data > > > Key: SPARK-18962 > URL: https://issues.apache.org/jira/browse/SPARK-18962 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When i'm trying to read the below mentioned csv source file and creating an > parquet file from that throws an java.lang.IllegalArgumentException: Invalid > DECIMAL scale: -9 exception. > The source file content is > Row(column name) > 9.03E+12 > 1.19E+11 > Refer the given code used read the csv file and creating an parquet file: > //Read the csv file > Dataset dataset = getSqlContext().read() > .option(DAWBConstant.HEADER, "true") > .option(DAWBConstant.PARSER_LIB, "commons") > .option(DAWBConstant.INFER_SCHEMA, "true") > .option(DAWBConstant.DELIMITER, ",") > .option(DAWBConstant.QUOTE, "\"") > .option(DAWBConstant.ESCAPE, " > ") > .option(DAWBConstant.MODE, Mode.PERMISSIVE) > .csv(sourceFile) > // create an parquet file > dataset.write().parquet("//path.parquet") > Stack trace: > Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 > at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) > at > org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) > at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) > at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) > at org.apache.parquet.schema.Types$Builder.named(Types.java:228) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at org.apache.spark.sql.types.StructType.map(StructType.scala:95) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288) > at > org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18962) Unable to create parquet file for the given data
Navya Krishnappa created SPARK-18962: Summary: Unable to create parquet file for the given data Key: SPARK-18962 URL: https://issues.apache.org/jira/browse/SPARK-18962 Project: Spark Issue Type: Bug Reporter: Navya Krishnappa When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(DAWBConstant.HEADER, "true") .option(DAWBConstant.PARSER_LIB, "commons") .option(DAWBConstant.INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, " ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile) // create an parquet file dataset.write().parquet("//path.parquet") Stack trace: Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) at org.apache.parquet.schema.Types$Builder.named(Types.java:228) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at org.apache.spark.sql.types.StructType.map(StructType.scala:95) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487 ] Navya Krishnappa edited comment on SPARK-18877 at 12/19/16 7:56 AM: Thank you [~dongjoon] and i will create an issue in Apace parquet. was (Author: navya krishnappa): Thank you [~dongjoon] > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487 ] Navya Krishnappa edited comment on SPARK-18877 at 12/19/16 7:56 AM: Thank you [~dongjoon] and i will create an issue in Apache Parquet JIRA. was (Author: navya krishnappa): Thank you [~dongjoon] and i will create an issue in Apace parquet. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760487#comment-15760487 ] Navya Krishnappa commented on SPARK-18877: -- Thank you [~dongjoon] > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756554#comment-15756554 ] Navya Krishnappa commented on SPARK-18877: -- Thank you for replying [~dongjoon]. Can you help me in understanding whether the above mentioned PR will resolve the below mentioned issue. I have another issue with respect to the decimal scale. When i'm trying to read the below mentioned csv source file and creating an parquet file from that throws an java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 exception. The source file content is Row(column name) 9.03E+12 1.19E+11 Refer the given code used read the csv file and creating an parquet file: //Read the csv file Dataset dataset = getSqlContext().read() .option(DAWBConstant.HEADER, "true") .option(DAWBConstant.PARSER_LIB, "commons") .option(DAWBConstant.INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, " ") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile) // create an parquet file dataset.write().parquet("//path.parquet") Stack trace: Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55) at org.apache.parquet.schema.Types$PrimitiveBuilder.decimalMetadata(Types.java:410) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:324) at org.apache.parquet.schema.Types$PrimitiveBuilder.build(Types.java:250) at org.apache.parquet.schema.Types$Builder.named(Types.java:228) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:412) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convert$1.apply(ParquetSchemaConverter.scala:313) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at org.apache.spark.sql.types.StructType.map(StructType.scala:95) at org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convert(ParquetSchemaConverter.scala:313) at org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport.init(ParquetWriteSupport.scala:85) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:288) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetFileFormat.scala:562) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:139) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following e
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754865#comment-15754865 ] Navya Krishnappa commented on SPARK-18877: -- I'm using SparkContext.read() to read the content. Refer the given code using to read the csv file. Dataset dataset = getSqlContext().read() .option(DAWBConstant.HEADER, "true") .option(DAWBConstant.PARSER_LIB, "commons") .option(DAWBConstant.INFER_SCHEMA, "true") .option(DAWBConstant.DELIMITER, ",") .option(DAWBConstant.QUOTE, "\"") .option(DAWBConstant.ESCAPE, "\\") .option(DAWBConstant.MODE, Mode.PERMISSIVE) .csv(sourceFile); if we collect the dataset (dataset.collect()). i will get java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 exception. > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753728#comment-15753728 ] Navya Krishnappa commented on SPARK-18877: -- Precision and scale vary depending on the decimal values in the column. Suppose if source file contains Amount(column name) 9.03E+12 1.19E+11 24335739714 1.71E+11 then spark consider Amount column as decimal(3,-9). and throws an below mentioned exception Caused by: java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 exceeds max precision 3 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:112) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:425) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:264) at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116) > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15753716#comment-15753716 ] Navya Krishnappa commented on SPARK-18877: -- I'm reading through csvReader (.csv(sourceFile)) and i'm not setting any precision and scale, Spark is automatically detecting the precision and scale for the values in the source file. And precision and scale vary depending on the decimal values in the column. Stack trace: Caused by: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 at scala.Predef$.require(Predef.scala:224) at org.apache.spark.sql.types.Decimal.set(Decimal.scala:112) at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:425) at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:264) at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116) at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:127) at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ... 1 common frames omitted > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
[ https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navya Krishnappa updated SPARK-18877: - Affects Version/s: 2.0.2 > Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: > requirement failed: Decimal precision 28 exceeds max precision 20 > -- > > Key: SPARK-18877 > URL: https://issues.apache.org/jira/browse/SPARK-18877 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Navya Krishnappa > > When reading below mentioned csv data, even though the maximum decimal > precision is 38, following exception is thrown > java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 > exceeds max precision 20 > Decimal > 2323366225312000 > 2433573971400 > 23233662253000 > 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20
Navya Krishnappa created SPARK-18877: Summary: Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 Key: SPARK-18877 URL: https://issues.apache.org/jira/browse/SPARK-18877 Project: Spark Issue Type: Bug Reporter: Navya Krishnappa When reading below mentioned csv data, even though the maximum decimal precision is 38, following exception is thrown java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20 Decimal 2323366225312000 2433573971400 23233662253000 23233662253 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org