[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-04 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Environment: I use Apache Spark 2.1.1 (used 2.1.0 and it was the same, 
switched today). Tested only on Mac  (was: I use Apache Spark 2.1.1 (used 2.1.0 
and it was the same, switched today). Tested only Mac)

> Writing Parquet: Cannot build an empty group
> 
>
> Key: SPARK-20593
> URL: https://issues.apache.org/jira/browse/SPARK-20593
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Shell
>Affects Versions: 2.1.1
> Environment: I use Apache Spark 2.1.1 (used 2.1.0 and it was the 
> same, switched today). Tested only on Mac
>Reporter: Viktor Khristenko
>Priority: Minor
>
> Hi,
> This is my first ticket and I apologize for/if I'm doing certain things in an 
> improper way.
>  I have a dataset:
> {noformat}
> root
> |-- muons: array (nullable = true)
> ||-- element: struct (containsNull = true)
> |||-- reco::Candidate: struct (nullable = true)
> |||-- qx3_: integer (nullable = true)
> |||-- pt_: float (nullable = true)
> |||-- eta_: float (nullable = true)
> |||-- phi_: float (nullable = true)
> |||-- mass_: float (nullable = true)
> |||-- vertex_: struct (nullable = true)
> ||||-- fCoordinates: struct (nullable = true)
> |||||-- fX: float (nullable = true)
> |||||-- fY: float (nullable = true)
> |||||-- fZ: float (nullable = true)
> |||-- pdgId_: integer (nullable = true)
> |||-- status_: integer (nullable = true)
> |||-- cachePolarFixed_: struct (nullable = true)
> |||-- cacheCartesianFixed_: struct (nullable = true)
> {noformat}
> As you can see, there are 3 empty structs in this schema. I know 100% that I 
> can read/manipulate/do whatever. However, when I try writing to disk in 
> parquet, I get the following Exception:
> ds.write.format("parquet").save(outputPathName):
> java.lang.IllegalStateException: Cannot build an empty group
> at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
> at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
> at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
> at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
> So, basically I would like to understand if it's a bug or an intended 
> behavior??? I also assume that it's related to the empty structs. Any help 
> would be really appreciated!
> I've quickly created stripped version and that one works without any issues!
> For reference, I put a link to the original question on SO[1]
> VK
> [1] 
> http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-04 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

{noformat}
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
{noformat}

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to the original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

{noformat}
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
{noformat}

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-04 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

{noformat}
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
{noformat}

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

```
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
```

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

```
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
```

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

{quote}
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
{quote}

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

{quote}
root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)
{quote}

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
- muons: array (nullable = true)
  - element: struct (containsNull = true)
- reco::Candidate: struct (nullable = true)
  - qx3_: integer (nullable = true)
  - pt_: float (nullable = true)
  - eta_: float (nullable = true)
  - phi_: float (nullable = true)
  - mass_: float (nullable = true)
  - vertex_: struct (nullable = true)
  - fCoordinates: struct (nullable = true)
  - fX: float (nullable = true)
  - fY: float (nullable = true)
  - fZ: float (nullable = true)
  - pdgId_: integer (nullable = true)
  - status_: integer (nullable = true)
  - cachePolarFixed_: struct (nullable = true)
  - cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
- muons: array (nullable = true)
  - element: struct (containsNull = true)
- reco::Candidate: struct (nullable = true)
  - qx3_: integer (nullable = true)
  - pt_: float (nullable = true)
  - eta_: float (nullable = true)
  - phi_: float (nullable = true)
  - mass_: float (nullable = true)
  - vertex_: struct (nullable = true)
  - fCoordinates: struct (nullable = true)
  - fX: float (nullable = true)
  - fY: float (nullable = true)
  - fZ: float (nullable = true)
  - pdgId_: integer (nullable = true)
  - status_: integer (nullable = true)
  - cachePolarFixed_: struct (nullable = true)
  - cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
* muons: array (nullable = true)
  * element: struct (containsNull = true)
* reco::Candidate: struct (nullable = true)
  * qx3_: integer (nullable = true)
* pt_: float (nullable = true)
* eta_: float (nullable = true)
* phi_: float (nullable = true)
* mass_: float (nullable = true)
* vertex_: struct (nullable = true)
* fCoordinates: struct (nullable = true)
* fX: float (nullable = true)
* fY: float (nullable = true)
* fZ: float (nullable = true)
* pdgId_: integer (nullable = true)
* status_: integer (nullable = true)
* cachePolarFixed_: struct (nullable = true)
* cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group


> Writing Parquet: Cannot build an empty group
> 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
* muons: array (nullable = true)
  * element: struct (containsNull = true)
* reco::Candidate: struct (nullable = true)
  * qx3_: integer (nullable = true)
* pt_: float (nullable = true)
* eta_: float (nullable = true)
* phi_: float (nullable = true)
* mass_: float (nullable = true)
* vertex_: struct (nullable = true)
* fCoordinates: struct (nullable = true)
* fX: float (nullable = true)
* fY: float (nullable = true)
* fZ: float (nullable = true)
* pdgId_: integer (nullable = true)
* status_: integer (nullable = true)
* cachePolarFixed_: struct (nullable = true)
* cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
* muons: array (nullable = true)
  * element: struct (containsNull = true)
   * reco::Candidate: struct (nullable = true)
* qx3_: integer (nullable = true)
* pt_: float (nullable = true)
* eta_: float (nullable = true)
* phi_: float (nullable = true)
* mass_: float (nullable = true)
* vertex_: struct (nullable = true)
* fCoordinates: struct (nullable = true)
* fX: float (nullable = true)
* fY: float (nullable = true)
* fZ: float (nullable = true)
* pdgId_: integer (nullable = true)
* status_: integer (nullable = true)
* cachePolarFixed_: struct (nullable = true)
* cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group


> Writing Parquet: Cannot build an empty group
> 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
* muons: array (nullable = true)
  * element: struct (containsNull = true)
   * reco::Candidate: struct (nullable = true)
* qx3_: integer (nullable = true)
* pt_: float (nullable = true)
* eta_: float (nullable = true)
* phi_: float (nullable = true)
* mass_: float (nullable = true)
* vertex_: struct (nullable = true)
* fCoordinates: struct (nullable = true)
* fX: float (nullable = true)
* fY: float (nullable = true)
* fZ: float (nullable = true)
* pdgId_: integer (nullable = true)
* status_: integer (nullable = true)
* cachePolarFixed_: struct (nullable = true)
* cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
 muons: array (nullable = true)
   element: struct (containsNull = true)
reco::Candidate: struct (nullable = true)
  qx3_: integer (nullable = true)
  pt_: float (nullable = true)
  eta_: float (nullable = true)
  phi_: float (nullable = true)
  mass_: float (nullable = true)
  vertex_: struct (nullable = true)
  fCoordinates: struct (nullable = true)
  fX: float (nullable = true)
  fY: float (nullable = true)
  fZ: float (nullable = true)
  pdgId_: integer (nullable = true)
  status_: integer (nullable = true)
  cachePolarFixed_: struct (nullable = true)
  cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group


> Writing Parquet: Cannot build an empty group
> 

[jira] [Updated] (SPARK-20593) Writing Parquet: Cannot build an empty group

2017-05-03 Thread Viktor Khristenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Khristenko updated SPARK-20593:
--
Description: 
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
 muons: array (nullable = true)
   element: struct (containsNull = true)
reco::Candidate: struct (nullable = true)
  qx3_: integer (nullable = true)
  pt_: float (nullable = true)
  eta_: float (nullable = true)
  phi_: float (nullable = true)
  mass_: float (nullable = true)
  vertex_: struct (nullable = true)
  fCoordinates: struct (nullable = true)
  fX: float (nullable = true)
  fY: float (nullable = true)
  fZ: float (nullable = true)
  pdgId_: integer (nullable = true)
  status_: integer (nullable = true)
  cachePolarFixed_: struct (nullable = true)
  cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1] 
http://stackoverflow.com/questions/43767358/apache-spark-parquet-cannot-build-an-empty-group

  was:
Hi,

This is my first ticket and I apologize for/if I'm doing certain things in an 
improper way.

 I have a dataset:

root
|-- muons: array (nullable = true)
||-- element: struct (containsNull = true)
|||-- reco::Candidate: struct (nullable = true)
|||-- qx3_: integer (nullable = true)
|||-- pt_: float (nullable = true)
|||-- eta_: float (nullable = true)
|||-- phi_: float (nullable = true)
|||-- mass_: float (nullable = true)
|||-- vertex_: struct (nullable = true)
||||-- fCoordinates: struct (nullable = true)
|||||-- fX: float (nullable = true)
|||||-- fY: float (nullable = true)
|||||-- fZ: float (nullable = true)
|||-- pdgId_: integer (nullable = true)
|||-- status_: integer (nullable = true)
|||-- cachePolarFixed_: struct (nullable = true)
|||-- cacheCartesianFixed_: struct (nullable = true)

As you can see, there are 3 empty structs in this schema. I know 100% that I 
can read/manipulate/do whatever. However, when I try writing to disk in 
parquet, I get the following Exception:

ds.write.format("parquet").save(outputPathName):

java.lang.IllegalStateException: Cannot build an empty group
at org.apache.parquet.Preconditions.checkState(Preconditions.java:91)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:622)
at org.apache.parquet.schema.Types$BaseGroupBuilder.build(Types.java:497)
at org.apache.parquet.schema.Types$Builder.named(Types.java:286)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:535)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter.convertField(ParquetSchemaConverter.scala:321)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:534)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaConverter$$anonfun$convertField$1.apply(ParquetSchemaConverter.scala:533)
So, basically I would like to understand if it's a bug or an intended 
behavior??? I also assume that it's related to the empty structs. Any help 
would be really appreciated!

I've quickly created stripped version and that one works without any issues!
For reference, I put a link to a original question on SO[1]

VK

[1]