[jira] [Commented] (SPARK-11183) enable support for mesos 0.24+

2016-02-03 Thread Vladimir Picka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130728#comment-15130728
 ] 

Vladimir Picka commented on SPARK-11183:


Is there any ongoing support for Mesos in Spark? Mesos is already on 0.27. Are 
we locked to Mesos 0.21? The latest Mesos which might work is probably 0.23.

It seems like a real problem with Mesos, if any framework lags with updates we 
can't upgrade:(

> enable support for mesos 0.24+
> --
>
> Key: SPARK-11183
> URL: https://issues.apache.org/jira/browse/SPARK-11183
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Ioannis Polyzos
>
> mesos 0.24, the mesos leader info in ZK has changed to json tis result to 
> spark failed to running on 0.24+.
> References : 
>   https://issues.apache.org/jira/browse/MESOS-2340 
>   
> http://mail-archives.apache.org/mod_mbox/mesos-commits/201506.mbox/%3ced4698dc56444bcdac3bdf19134db...@git.apache.org%3E
>   https://github.com/mesos/elasticsearch/issues/338
>   https://github.com/spark-jobserver/spark-jobserver/issues/267



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-18 Thread Vladimir Picka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14805072#comment-14805072
 ] 

Vladimir Picka commented on SPARK-10659:


Thanks so much. I will put it to test on our use case.

> DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not 
> nullable) flag in schema
> --
>
> Key: SPARK-10659
> URL: https://issues.apache.org/jira/browse/SPARK-10659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0
>Reporter: Vladimir Picka
>
> DataFrames currently automatically promotes all Parquet schema fields to 
> optional when they are written to an empty directory. The problem remains in 
> v1.5.0.
> The culprit is this code:
> val relation = if (doInsertion) {
>   // This is a hack. We always set
> nullable/containsNull/valueContainsNull to true
>   // for the schema of a parquet data.
>   val df =
> sqlContext.createDataFrame(
>   data.queryExecution.toRdd,
>   data.schema.asNullable)
>   val createdRelation =
> createRelation(sqlContext, parameters,
> df.schema).asInstanceOf[ParquetRelation2]
>   createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
>   createdRelation
> }
> which was implemented as part of this PR:
> https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b
> This very unexpected behaviour for some use cases when files are read from 
> one place and written to another like small file packing - it ends up with 
> incompatible files because required can't be promoted to optional normally. 
> It is essence of a schema that it enforces "required" invariant on data. It 
> should be supposed that it is intended.
> I believe that a better approach is to have default behaviour to keep schema 
> as is and provide f.e. a builder method or option to allow forcing to 
> optional.
> Right now we have to overwrite private API so that our files are rewritten as 
> is with all its perils.
> Vladimir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10659) SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)
Vladimir Picka created SPARK-10659:
--

 Summary: SparkSQL saveAsParquetFile does not preserve REQUIRED 
(not nullable) flag in schema
 Key: SPARK-10659
 URL: https://issues.apache.org/jira/browse/SPARK-10659
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0
Reporter: Vladimir Picka


DataFrames currently automatically promotes all Parquet schema fields to 
optional when they are written to an empty directory. The problem remains in 
v1.5.0.

The culprit is this code:
val relation = if (doInsertion) {
  // This is a hack. We always set
nullable/containsNull/valueContainsNull to true
  // for the schema of a parquet data.
  val df =
sqlContext.createDataFrame(
  data.queryExecution.toRdd,
  data.schema.asNullable)
  val createdRelation =
createRelation(sqlContext, parameters,
df.schema).asInstanceOf[ParquetRelation2]
  createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
  createdRelation
}

which was implemented as part of this PR:
https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b

This very unexpected behaviour for some use cases when files are read from one 
place and written to another like small file packing - it ends up with 
incompatible files because required can't be promoted to optional normally. It 
is essence of a schema that it enforces "required" invariant on data. It should 
be supposed that it is intended.

I believe that a better approach is to have default behaviour to keep schema as 
is and provide f.e. a builder method or option to allow forcing to optional.

Right now we have to overwrite private API so that our files are rewritten as 
is with all its perils.

Vladimir




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Picka updated SPARK-10659:
---
Summary: DataFrames and SparkSQL saveAsParquetFile does not preserve 
REQUIRED (not nullable) flag in schema  (was: SparkSQL saveAsParquetFile does 
not preserve REQUIRED (not nullable) flag in schema)

> DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not 
> nullable) flag in schema
> --
>
> Key: SPARK-10659
> URL: https://issues.apache.org/jira/browse/SPARK-10659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0
>Reporter: Vladimir Picka
>
> DataFrames currently automatically promotes all Parquet schema fields to 
> optional when they are written to an empty directory. The problem remains in 
> v1.5.0.
> The culprit is this code:
> val relation = if (doInsertion) {
>   // This is a hack. We always set
> nullable/containsNull/valueContainsNull to true
>   // for the schema of a parquet data.
>   val df =
> sqlContext.createDataFrame(
>   data.queryExecution.toRdd,
>   data.schema.asNullable)
>   val createdRelation =
> createRelation(sqlContext, parameters,
> df.schema).asInstanceOf[ParquetRelation2]
>   createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
>   createdRelation
> }
> which was implemented as part of this PR:
> https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b
> This very unexpected behaviour for some use cases when files are read from 
> one place and written to another like small file packing - it ends up with 
> incompatible files because required can't be promoted to optional normally. 
> It is essence of a schema that it enforces "required" invariant on data. It 
> should be supposed that it is intended.
> I believe that a better approach is to have default behaviour to keep schema 
> as is and provide f.e. a builder method or option to allow forcing to 
> optional.
> Right now we have to overwrite private API so that our files are rewritten as 
> is with all its perils.
> Vladimir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791613#comment-14791613
 ] 

Vladimir Picka commented on SPARK-10659:


Here is unanswered attempt for discussion on a mailing list:
https://mail.google.com/mail/#search/label%3Aspark-user+petr/14f64c75c15f5ccd

> DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not 
> nullable) flag in schema
> --
>
> Key: SPARK-10659
> URL: https://issues.apache.org/jira/browse/SPARK-10659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0
>Reporter: Vladimir Picka
>
> DataFrames currently automatically promotes all Parquet schema fields to 
> optional when they are written to an empty directory. The problem remains in 
> v1.5.0.
> The culprit is this code:
> val relation = if (doInsertion) {
>   // This is a hack. We always set
> nullable/containsNull/valueContainsNull to true
>   // for the schema of a parquet data.
>   val df =
> sqlContext.createDataFrame(
>   data.queryExecution.toRdd,
>   data.schema.asNullable)
>   val createdRelation =
> createRelation(sqlContext, parameters,
> df.schema).asInstanceOf[ParquetRelation2]
>   createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
>   createdRelation
> }
> which was implemented as part of this PR:
> https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b
> This very unexpected behaviour for some use cases when files are read from 
> one place and written to another like small file packing - it ends up with 
> incompatible files because required can't be promoted to optional normally. 
> It is essence of a schema that it enforces "required" invariant on data. It 
> should be supposed that it is intended.
> I believe that a better approach is to have default behaviour to keep schema 
> as is and provide f.e. a builder method or option to allow forcing to 
> optional.
> Right now we have to overwrite private API so that our files are rewritten as 
> is with all its perils.
> Vladimir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org