Re: dremel paper example schema

2018-10-31 Thread lchorbadjiev
Hi Jorn, Thanks for the help. I switched to using Apache Parquet 1.8.3 and now Spark successfully loads the parquet file. Do you have any hint for the other part of my question? What is the correct way to reproduce this schema: message Document { required int64 DocId; optional group Links {

Re: dremel paper example schema

2018-10-31 Thread lchorbadjiev
Hi Jorn, I am using Apache Spark 2.3.1. For creating the parquet file I have used Apache Parquet (parquet-mr) 1.10. This does not match the version of parquet used in Apache Spark 2.3.1 and if you think that this could be the problem I could try to use Apache Parquet version 1.8.3. I created a

Re: dremel paper example schema

2018-10-30 Thread lchorbadjiev
Hi Gourav, the question in fact is are there any the limitations of Apache Spark support for Parquet file format. The example schema from the dremel paper is something that is supported in Apache Parquet (using Apache Parquet Java API). Now I am trying to implement the same schema using Apache

dremel paper example schema

2018-10-29 Thread lchorbadjiev
Hi, I'm trying to reproduce the example from dremel paper (https://research.google.com/pubs/archive/36632.pdf) in Apache Spark using pyspark and I wonder if it is possible at all? Trying to follow the paper example as close as possible I created this document type: from pyspark.sql.types import