[ https://issues.apache.org/jira/browse/SPARK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
cen yuhai updated SPARK-5192: ----------------------------- Summary: Parquet fails to parse schema contains '\r' (was: Parquet fails to parse schemas contains '\r') > Parquet fails to parse schema contains '\r' > ------------------------------------------- > > Key: SPARK-5192 > URL: https://issues.apache.org/jira/browse/SPARK-5192 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Environment: windows7 + Intellj idea 13.0.2 > Reporter: cen yuhai > Priority: Critical > Fix For: 1.3.0 > > > I think this is actually a bug in parquet, when i debuged 'ParquetTestData', > i found a exception as below. So i download the source of MessageTypeParser, > the funtion 'isWhitespace' do not check for '\r' > private boolean isWhitespace(String t) { > return t.equals(" ") || t.equals("\t") || t.equals("\n"); > } > So I replace all '\r' to work around this issue. > val subTestSchema = > """ > message myrecord { > optional boolean myboolean; > optional int64 mylong; > } > """.replaceAll("\r","") > at line 0: message myrecord { > at > parquet.schema.MessageTypeParser.asRepetition(MessageTypeParser.java:203) > at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:101) > at > parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96) > at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89) > at > parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79) > at > org.apache.spark.sql.parquet.ParquetTestData$.writeFile(ParquetTestData.scala:221) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:92) > at > org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:85) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.run(ParquetQuerySuite.scala:85) > -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org