Thanks. Erin Sobkow, BA Kin, RMT Community Consultant Parkland Valley Sport, Culture & Recreation District
Box 263, Yorkton, SK S3N 2V7 Phone: (306) 786-6585 Fax: (306) 782-0474 Email: esob...@parklandvalley.ca Website: www.parklandvalley.ca If you no longer wish to receive electronic messages from Parkland Valley Sport, Culture & Recreation District please reply with the word 'STOP'. Together...building healthy communities through sport, culture and recreation -----Original Message----- From: Wes McKinney [mailto:wesmck...@gmail.com] Sent: August 16, 2017 11:30 AM To: dev@arrow.apache.org Subject: Re: Major difference between Spark and Arrow Parquet Implementations hi Erin -- please send a separate e-mail to dev-unsubscr...@arrow.apache.org Thanks On Wed, Aug 16, 2017 at 1:06 PM, Erin Sobkow <esob...@parklandvalley.ca> wrote: > Hi Wes: > > Somehow I have been inadvertently added to your list and am getting all these > emails that make no sense to me at all. I'm in on some conversation I know > nothing about and am getting up to 20 emails a day from different people. > Can I ask you to remove me from your list and can you get all the other > people in your group to remove me as well? Thanks! > > Erin Sobkow, BA Kin, RMT > Community Consultant > Parkland Valley Sport, Culture & Recreation District > > Box 263, Yorkton, SK S3N 2V7 > Phone: (306) 786-6585 > Fax: (306) 782-0474 > Email: esob...@parklandvalley.ca > Website: www.parklandvalley.ca > > If you no longer wish to receive electronic messages from Parkland Valley > Sport, Culture & Recreation District please reply with the word 'STOP'. > > > > Together...building healthy communities through sport, culture and recreation > > -----Original Message----- > From: Wes McKinney [mailto:wesmck...@gmail.com] > Sent: August 16, 2017 10:04 AM > To: dev@arrow.apache.org > Subject: Re: Major difference between Spark and Arrow Parquet Implementations > > hi Lucas, > > My understanding is that the Parquet format by itself does not place any such > restrictions on the names of fields, and so this is a Spark SQL-specific > issue (anyone please correct me if I'm mistaken about this). I would be happy > to help add a schema cleaning option to normalize field names for use in > Spark. I just opened: > > https://issues.apache.org/jira/browse/ARROW-1359 > > Thanks > Wes > > On Wed, Aug 16, 2017 at 11:58 AM, Lucas Pickup > <lucas.pic...@microsoft.com.invalid> wrote: >> Hello, >> >> I have been using pyarrow and PySpark to write Parquet files. I have used >> pyarrow to successfully write out a Parquet file with spaces in column >> names. E.g. 'X Coordinate'. >> When I try to write out the same dataset using Sparks Parquet writer it >> fails claiming: >> "Attribute name "X Coordinate" contains invalid character(s) among " >> ,;{}()\\n\\t<file://n//t>="". >> It seems that according to Spark's Parquet implementation those above >> characters are not allowed to be a part of a Parquet Schema due to special >> meaning. >> The code that checks this is >> here<https://github.com/apache/spark/blob/cba826d00173a945b0c9a7629c66e36fa73b723e/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L565>. >> >> I was wondering if there was a reason why the implementations have such a >> major difference when it comes to schema generation? >> >> Cheers, Lucas Pickup >