[jira] [Updated] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-16 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-18484:
-
Description: 
Currently when using decimal type (BigDecimal in scala case class) there's no 
way to enforce precision and scale. This is quite critical when saving data - 
regarding space usage and compatibility with external systems (for example Hive 
table) because spark saves data as Decimal(38,18)

{code}
case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)
{code}

Workaround is to convert dataset to dataframe before saving and manually cast 
to specific decimal scale/precision:

{code}
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)
{code}

  was:
Currently when using decimal type (BigDecimal in scala case class) there's no 
way to enforce precision and scale. This is quite critical when saving data - 
regarding space usage and compatibility with external systems (for example Hive 
table) because spark saves data as Decimal(38,18)

{code:scala}
val spark: SparkSession = ???

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)
{code}

Workaround is to convert dataset to dataframe before saving and manually cast 
to specific decimal scale/precision:

{code:scala}
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)
{code}


> case class datasets - ability to specify decimal precision and scale
> 
>
> Key: SPARK-18484
> URL: https://issues.apache.org/jira/browse/SPARK-18484
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-16 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18484:


 Summary: case class datasets - ability to specify decimal 
precision and scale
 Key: SPARK-18484
 URL: https://issues.apache.org/jira/browse/SPARK-18484
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.1, 2.0.0
Reporter: Damian Momot


Currently when using decimal type (BigDecimal in scala case class) there's no 
way to enforce precision and scale. This is quite critical when saving data - 
regarding space usage and compatibility with external systems (for example Hive 
table) because spark saves data as Decimal(38,18)

{code:scala}
val spark: SparkSession = ???

case class TestClass(id: String, money: BigDecimal)

val testDs = spark.createDataset(Seq(
  TestClass("1", BigDecimal("22.50")),
  TestClass("2", BigDecimal("500.66"))
))

testDs.printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(38,18) (nullable = true)
{code}

Workaround is to convert dataset to dataframe before saving and manually cast 
to specific decimal scale/precision:

{code:scala}
import org.apache.spark.sql.types.DecimalType
val testDf = testDs.toDF()

testDf
  .withColumn("money", testDf("money").cast(DecimalType(10,2)))
  .printSchema()
{code}

{code}
root
 |-- id: string (nullable = true)
 |-- money: decimal(10,2) (nullable = true)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673257#comment-15673257
 ] 

Damian Momot commented on SPARK-18484:
--

I'm not saying it's not possible to workaround it - I'm saying that manual 
schema transformation/specification is not convenient at all.

In java world we could just use simple annotation to specify precision/scale 
for such decimal field and spark could use that when inferring schema from type.

For scala, annotations are not standard way so maybe some different approach 
could be used

> case class datasets - ability to specify decimal precision and scale
> 
>
> Key: SPARK-18484
> URL: https://issues.apache.org/jira/browse/SPARK-18484
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673347#comment-15673347
 ] 

Damian Momot commented on SPARK-18484:
--

Only thing which comes to my mind would be something like (+ it would need to 
use http://www.scala-lang.org/api/2.9.2/scala/annotation/target/package.html):

{code}
case class TestClass(id: String, @DecimalPrecision(10, 2) money: BigDecimal)
{code}

But AFAIK it's not "scala-way", don't have better idea :)

> case class datasets - ability to specify decimal precision and scale
> 
>
> Key: SPARK-18484
> URL: https://issues.apache.org/jira/browse/SPARK-18484
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18491:


 Summary: Spark uses mutable classes for date/time types mapping
 Key: SPARK-18491
 URL: https://issues.apache.org/jira/browse/SPARK-18491
 Project: Spark
  Issue Type: Improvement
Reporter: Damian Momot


TimestampType is mapped to java.sql.Timestamp
DateType is mapped to java.sql.Date

Those both java types are mutable and thus their usage is highly discourage, 
especially in distributed computing which uses lazy, functional approach

Mapping to immutable joda times should be enough for now (until scala 2.12 + 
jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-18491:
-
Description: 
TimestampType is mapped to java.sql.Timestamp
DateType is mapped to java.sql.Date

Those both java types are mutable and thus their usage is highly discourage, 
especially in distributed computing which uses lazy, functional approach

Mapping to immutable joda times should be enough for now (until scala 2.12 + 
jdk8 java.time is available for spark)

It becomes more relevant now as schema is inferred from user provided case 
classes in datasets

  was:
TimestampType is mapped to java.sql.Timestamp
DateType is mapped to java.sql.Date

Those both java types are mutable and thus their usage is highly discourage, 
especially in distributed computing which uses lazy, functional approach

Mapping to immutable joda times should be enough for now (until scala 2.12 + 
jdk8 java.time is available for spark)


> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)
> It becomes more relevant now as schema is inferred from user provided case 
> classes in datasets



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673590#comment-15673590
 ] 

Damian Momot commented on SPARK-18491:
--

I don't mean internal usage - it's about user usage related to datasets.

If I want to create dataset which will be mapped to timestamp datatype i have 
to use

{code}
case class Test(id: String, timestamp: java.sql.Timestamp)
{code}

It will work and save data properly to parquet etc. but all my further dataset 
manipulation code will be prone to problems related to field mutability

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673608#comment-15673608
 ] 

Damian Momot commented on SPARK-18491:
--

Until Java 8 is available spark could do what all other JVM data mapping 
libraries are doing - allow to use Joda Time classes which solved mutability 
problem long ago and were defacto standard java date/time types until JDK8 was 
released.

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673608#comment-15673608
 ] 

Damian Momot edited comment on SPARK-18491 at 11/17/16 12:37 PM:
-

Until Java 8 is available spark could do what all other JVM data mapping 
libraries are doing - allow to use Joda Time classes which solved mutability 
problem long ago and were defacto standard java date/time types before JDK8 was 
released.


was (Author: daimon):
Until Java 8 is available spark could do what all other JVM data mapping 
libraries are doing - allow to use Joda Time classes which solved mutability 
problem long ago and were defacto standard java date/time types until JDK8 was 
released.

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673655#comment-15673655
 ] 

Damian Momot commented on SPARK-18491:
--

Just to be clear - i'm not saying that spark has internal problems because of 
it.

I'm saying that because user code must use mutable java.sql.* types, user code 
is prone to bugs because of it.

Sorry for confussion.

And AFAIK user code has to use those 2  java.sql.* types so spark can properly 
infer schema from case class dataset, right?

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673685#comment-15673685
 ] 

Damian Momot commented on SPARK-18491:
--

Good library/API is all about convenience :) If it wasn't we would be still 
writing processing in assembler ;)

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673719#comment-15673719
 ] 

Damian Momot commented on SPARK-18491:
--

You don't have to break compatibility. java.sql.Timestamp can still be 
internal/default type. At the same time it could be possible to define type as:

{code}
case class Test(id: String, timestamp: org.joda.time.Instant)
{code}

And Dataset[Test] would infer schema as TimestampType

As you told if it was possible to write custom encoders this could be easily 
extended, but from my short findings there isn't way to do that yet?

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18491) Spark uses mutable classes for date/time types mapping

2016-11-17 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673719#comment-15673719
 ] 

Damian Momot edited comment on SPARK-18491 at 11/17/16 1:25 PM:


You don't have to break compatibility. java.sql.Timestamp can still be 
internal/default type. At the same time it could be possible to define type 
using joda (it's on the classpath anyway) as:

{code}
case class Test(id: String, timestamp: org.joda.time.Instant)
{code}

And Dataset[Test] would infer schema as TimestampType

As you told if it was possible to write custom encoders this could be easily 
extended, but from my short findings there isn't way to do that yet?


was (Author: daimon):
You don't have to break compatibility. java.sql.Timestamp can still be 
internal/default type. At the same time it could be possible to define type as:

{code}
case class Test(id: String, timestamp: org.joda.time.Instant)
{code}

And Dataset[Test] would infer schema as TimestampType

As you told if it was possible to write custom encoders this could be easily 
extended, but from my short findings there isn't way to do that yet?

> Spark uses mutable classes for date/time types mapping
> --
>
> Key: SPARK-18491
> URL: https://issues.apache.org/jira/browse/SPARK-18491
> Project: Spark
>  Issue Type: Improvement
>Reporter: Damian Momot
>Priority: Minor
>
> TimestampType is mapped to java.sql.Timestamp
> DateType is mapped to java.sql.Date
> Those both java types are mutable and thus their usage is highly discourage, 
> especially in distributed computing which uses lazy, functional approach
> Mapping to immutable joda times should be enough for now (until scala 2.12 + 
> jdk8 java.time is available for spark)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18249) Spark 2.0.1 - StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15676143#comment-15676143
 ] 

Damian Momot commented on SPARK-18249:
--

We've moved several other flows and others are working fine. This one is quite 
exotic so maybe it hits some rare bug. I'll try to setup some spark-shell 
example which triggers same problem

> Spark 2.0.1 - StackOverflowError when saving dataset to parquet
> ---
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>   [some user code calls removed]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBytes(UnsafeRow.java:593)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeExternal(UnsafeRow.java:661)
>   at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1456)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.de

[jira] [Updated] (SPARK-18249) Spark 2.0.1 - StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-18249:
-
Affects Version/s: 2.0.2

> Spark 2.0.1 - StackOverflowError when saving dataset to parquet
> ---
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>   [some user code calls removed]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBytes(UnsafeRow.java:593)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeExternal(UnsafeRow.java:661)
>   at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1456)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream

[jira] [Updated] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-18249:
-
Affects Version/s: 2.1.0
  Summary: StackOverflowError when saving dataset to parquet  (was: 
Spark 2.0.1 - StackOverflowError when saving dataset to parquet)

> StackOverflowError when saving dataset to parquet
> -
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>   [some user code calls removed]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBytes(UnsafeRow.java:593)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeExternal(UnsafeRow.java:661)
>   at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1456)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.

[jira] [Commented] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-18 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15676734#comment-15676734
 ] 

Damian Momot commented on SPARK-18249:
--

Just check that it also happens on 2.0.2 and 2.1.0 (latest master code)

I tried to create simplified code for reproduce but unfortunately bug doesn't 
happen which such code.


Is there any other way (dev logs, etc.) i could help finding cause for this 
problem?

> StackOverflowError when saving dataset to parquet
> -
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>   [some user code calls removed]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.StackOverflowError
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBytes(UnsafeRow.java:593)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeExternal(UnsafeRow.java:661)
>   at 
> java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1456)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
>   at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputSt

[jira] [Comment Edited] (SPARK-18484) case class datasets - ability to specify decimal precision and scale

2016-11-18 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15673347#comment-15673347
 ] 

Damian Momot edited comment on SPARK-18484 at 11/18/16 1:51 PM:


Only thing which comes to my mind would be something like (+ it would need to 
use http://www.scala-lang.org/api/2.9.2/scala/annotation/target/package.html ):

{code}
case class TestClass(id: String, @DecimalPrecision(10, 2) money: BigDecimal)
{code}

But AFAIK it's not "scala-way", don't have better idea :)


was (Author: daimon):
Only thing which comes to my mind would be something like (+ it would need to 
use http://www.scala-lang.org/api/2.9.2/scala/annotation/target/package.html):

{code}
case class TestClass(id: String, @DecimalPrecision(10, 2) money: BigDecimal)
{code}

But AFAIK it's not "scala-way", don't have better idea :)

> case class datasets - ability to specify decimal precision and scale
> 
>
> Key: SPARK-18484
> URL: https://issues.apache.org/jira/browse/SPARK-18484
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Damian Momot
>
> Currently when using decimal type (BigDecimal in scala case class) there's no 
> way to enforce precision and scale. This is quite critical when saving data - 
> regarding space usage and compatibility with external systems (for example 
> Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
>   TestClass("1", BigDecimal("22.50")),
>   TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast 
> to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
>   .withColumn("money", testDf("money").cast(DecimalType(10,2)))
>   .printSchema()
> {code}
> {code}
> root
>  |-- id: string (nullable = true)
>  |-- money: decimal(10,2) (nullable = true)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682893#comment-15682893
 ] 

Damian Momot commented on SPARK-18249:
--

Hi, nope it's simple case class, exactly this:

{code}
case class ParsedUserAgent (
  user_agent: String,
  parser_version: String,
  client_type: Option[String],
  client_name: Option[String],
  client_version: Option[String],
  operating_system_name: Option[String],
  operating_system_version: Option[String],
  operating_system_platform: Option[String],
  device_type: Option[String],
  device_brand: Option[String],
  device_model: Option[String],
  is_bot: Option[Boolean],
  is_mobile: Option[Boolean],
  is_desktop: Option[Boolean]
)
{/code}

Flow itself is a bit more complicated as such structures are being read from 
HDFS, filtered, projected, distinct-collected, inserted into datasets again, 
unioned with other dataset, grouped and reduced.

In meantime there are some repartitions and caches using 
StorageLevel.MEMORY_AND_DISK_SER

When it first happened i tried:
- removing intermediate dataset saves/caches
- removing explicit repartitions
- switching from kryo serialziation to standard java serialziation

But no of those solved a problem.

I also tried to create spark-shell equivalent flow (but on simple structures) 
but it didn't trigger bug

> StackOverflowError when saving dataset to parquet
> -
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
>   [some user code calls removed]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)

[jira] [Comment Edited] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682893#comment-15682893
 ] 

Damian Momot edited comment on SPARK-18249 at 11/21/16 8:35 AM:


Hi, nope it's simple case class, exactly this:

{code}
case class ParsedUserAgent (
  user_agent: String,
  parser_version: String,
  client_type: Option[String],
  client_name: Option[String],
  client_version: Option[String],
  operating_system_name: Option[String],
  operating_system_version: Option[String],
  operating_system_platform: Option[String],
  device_type: Option[String],
  device_brand: Option[String],
  device_model: Option[String],
  is_bot: Option[Boolean],
  is_mobile: Option[Boolean],
  is_desktop: Option[Boolean]
)
{code}

Flow itself is a bit more complicated as such structures are being read from 
HDFS, filtered, projected, distinct-collected, inserted into datasets again, 
unioned with other dataset, grouped and reduced.

In meantime there are some repartitions and caches using 
StorageLevel.MEMORY_AND_DISK_SER

When it first happened i tried:
- removing intermediate dataset saves/caches
- removing explicit repartitions
- switching from kryo serialziation to standard java serialziation

But no of those solved a problem.

I also tried to create spark-shell equivalent flow (but on simple structures) 
but it didn't trigger bug


was (Author: daimon):
Hi, nope it's simple case class, exactly this:

{code}
case class ParsedUserAgent (
  user_agent: String,
  parser_version: String,
  client_type: Option[String],
  client_name: Option[String],
  client_version: Option[String],
  operating_system_name: Option[String],
  operating_system_version: Option[String],
  operating_system_platform: Option[String],
  device_type: Option[String],
  device_brand: Option[String],
  device_model: Option[String],
  is_bot: Option[Boolean],
  is_mobile: Option[Boolean],
  is_desktop: Option[Boolean]
)
{/code}

Flow itself is a bit more complicated as such structures are being read from 
HDFS, filtered, projected, distinct-collected, inserted into datasets again, 
unioned with other dataset, grouped and reduced.

In meantime there are some repartitions and caches using 
StorageLevel.MEMORY_AND_DISK_SER

When it first happened i tried:
- removing intermediate dataset saves/caches
- removing explicit repartitions
- switching from kryo serialziation to standard java serialziation

But no of those solved a problem.

I also tried to create spark-shell equivalent flow (but on simple structures) 
but it didn't trigger bug

> StackOverflowError when saving dataset to parquet
> -
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> 

[jira] [Comment Edited] (SPARK-18249) StackOverflowError when saving dataset to parquet

2016-11-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15682893#comment-15682893
 ] 

Damian Momot edited comment on SPARK-18249 at 11/21/16 8:37 AM:


Hi, nope it's simple case class, exactly this:

{code}
case class ParsedUserAgent (
  user_agent: String,
  parser_version: String,
  client_type: Option[String],
  client_name: Option[String],
  client_version: Option[String],
  operating_system_name: Option[String],
  operating_system_version: Option[String],
  operating_system_platform: Option[String],
  device_type: Option[String],
  device_brand: Option[String],
  device_model: Option[String],
  is_bot: Option[Boolean],
  is_mobile: Option[Boolean],
  is_desktop: Option[Boolean]
)
{code}

Flow itself is a bit more complicated as such structures are being read from 
HDFS, filtered, projected, distinct-collected, inserted into datasets again, 
unioned with other dataset, grouped and reduced.

In meantime there are some repartitions and caches using 
StorageLevel.MEMORY_AND_DISK_SER

When it first happened i tried:
- removing intermediate dataset saves/caches
- removing explicit repartitions
- switching from kryo serialziation to standard java serialziation

But no of those solved a problem.

I also tried to create spark-shell equivalent flow (but on simple structures) 
but it didn't trigger bug

As I mentioned same flow itself worked fine on 1.6.2 using dataframes/rdds


was (Author: daimon):
Hi, nope it's simple case class, exactly this:

{code}
case class ParsedUserAgent (
  user_agent: String,
  parser_version: String,
  client_type: Option[String],
  client_name: Option[String],
  client_version: Option[String],
  operating_system_name: Option[String],
  operating_system_version: Option[String],
  operating_system_platform: Option[String],
  device_type: Option[String],
  device_brand: Option[String],
  device_model: Option[String],
  is_bot: Option[Boolean],
  is_mobile: Option[Boolean],
  is_desktop: Option[Boolean]
)
{code}

Flow itself is a bit more complicated as such structures are being read from 
HDFS, filtered, projected, distinct-collected, inserted into datasets again, 
unioned with other dataset, grouped and reduced.

In meantime there are some repartitions and caches using 
StorageLevel.MEMORY_AND_DISK_SER

When it first happened i tried:
- removing intermediate dataset saves/caches
- removing explicit repartitions
- switching from kryo serialziation to standard java serialziation

But no of those solved a problem.

I also tried to create spark-shell equivalent flow (but on simple structures) 
but it didn't trigger bug

> StackOverflowError when saving dataset to parquet
> -
>
> Key: SPARK-18249
> URL: https://issues.apache.org/jira/browse/SPARK-18249
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Damian Momot
>
> Not really sure what's exact reproduction path. It's first job which was 
> updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
> hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
> parquet files (actually small data during test)
> Exception in thread "main" org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apach

[jira] [Created] (SPARK-18540) Wholestage code-gen for ORC Hive tables

2016-11-22 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18540:


 Summary: Wholestage code-gen for ORC Hive tables
 Key: SPARK-18540
 URL: https://issues.apache.org/jira/browse/SPARK-18540
 Project: Spark
  Issue Type: Improvement
Reporter: Damian Momot


Spark already optimizes hive parquet tables reading using whole stage code 
generation.

Similar approach could be used for scanning Hive ORC tables - currently 
standard hive table scan is used.

ORC is sometimes preferred over parquet in hive ecosystem because of better 
support and characteristics in certain scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18556) Suboptimal number of tasks when writing partitioned data with desired number of files per directory

2016-11-22 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18556:


 Summary: Suboptimal number of tasks when writing partitioned data 
with desired number of files per directory
 Key: SPARK-18556
 URL: https://issues.apache.org/jira/browse/SPARK-18556
 Project: Spark
  Issue Type: Improvement
Affects Versions: 2.0.2, 2.0.1, 2.0.0
Reporter: Damian Momot


It's unable to have optimal number of write tasks when optimal number of files 
per directory is known, example:

When saving data to hdfs:

1. Data which is supposed to be partitioned by column (for example date) - it 
contains for example 90 different dates
2. Upfront knowledge that each date should be written into X files (for example 
4, because of recommended hdfs/parquet block size etc.)
3. During processing, dataset was partitioned into 200 partitions (for example 
because of some grouping operations)

currently we can do

{code}
val data: Dataset[Row] = ???

data
  .write
  .partitionBy("date")
  .parquet("/xyz")
{code}

This will properly write data into 90 date directories (see point '1') but each 
directory will contain 200 files (see point '3')

We can force number of files by using repartition/coalesce:

{code}
val data: Dataset[Row] = ???

data
  .repartition(4)
  .write
  .partitionBy("date")
  .parquet("xyz")
{code}

This will properly save 90 directories, 4 files each... but it will be done 
using only 4 tasks which is way too slow - 360 files could be written in 
parallel using 360 tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2016-12-05 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18717:


 Summary: Datasets - crash (compile exception) when mapping to 
immutable scala map
 Key: SPARK-18717
 URL: https://issues.apache.org/jira/browse/SPARK-18717
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.2
Reporter: Damian Momot


{code}
val spark: SparkSession = ???

case class Test(id: String, map_test: Map[Long, String])

spark.sql("CREATE TABLE xyz.map_test (id string, map_test map) 
STORED AS PARQUET")

spark.sql("SELECT * FROM xyz.map_test").as[Test].map(t => t).collect()
{code}

{code}
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
307, Column 108: No applicable constructor/method found for actual parameters 
"java.lang.String, scala.collection.Map"; candidates are: 
"$line14.$read$$iw$$iw$Test(java.lang.String, scala.collection.immutable.Map)"
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18717) Datasets - crash (compile exception) when mapping to immutable scala map

2016-12-05 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15724702#comment-15724702
 ] 

Damian Momot commented on SPARK-18717:
--

Yep it's already workarounded this way in my code but usage of standard scala 
types is simply more natural

> Datasets - crash (compile exception) when mapping to immutable scala map
> 
>
> Key: SPARK-18717
> URL: https://issues.apache.org/jira/browse/SPARK-18717
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Damian Momot
>
> {code}
> val spark: SparkSession = ???
> case class Test(id: String, map_test: Map[Long, String])
> spark.sql("CREATE TABLE xyz.map_test (id string, map_test map) 
> STORED AS PARQUET")
> spark.sql("SELECT * FROM xyz.map_test").as[Test].map(t => t).collect()
> {code}
> {code}
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 307, Column 108: No applicable constructor/method found for actual parameters 
> "java.lang.String, scala.collection.Map"; candidates are: 
> "$line14.$read$$iw$$iw$Test(java.lang.String, scala.collection.immutable.Map)"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-18249) Spark 2.0.1 - StackOverflowError when saving dataset to parquet

2016-11-03 Thread Damian Momot (JIRA)
Damian Momot created SPARK-18249:


 Summary: Spark 2.0.1 - StackOverflowError when saving dataset to 
parquet
 Key: SPARK-18249
 URL: https://issues.apache.org/jira/browse/SPARK-18249
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.1
Reporter: Damian Momot


Not really sure what's exact reproduction path. It's first job which was 
updated from spark 1.6.2 (dataframes) to 2.0.1 (datasets) - both on 
hadoop.version=2.6.0-cdh5.6.0. It fails on last stage which saves data to 
parquet files (actually small data during test)

Exception in thread "main" org.apache.spark.SparkException: Job aborted.
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:510)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
at 
org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
[some user code calls removed]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.StackOverflowError
at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.getBytes(UnsafeRow.java:593)
at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeExternal(UnsafeRow.java:661)
at 
java.io.ObjectOutputStream.writeExternalData(ObjectOutputStream.java:1456)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1429)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1175)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1541)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1506)
at 
java.io.Ob

[jira] [Commented] (SPARK-16753) Spark SQL doesn't handle skewed dataset joins properly

2016-11-07 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643525#comment-15643525
 ] 

Damian Momot commented on SPARK-16753:
--

Isn't it same problem as https://issues.apache.org/jira/browse/SPARK-9862 
(which already has pull request ?)

> Spark SQL doesn't handle skewed dataset joins properly
> --
>
> Key: SPARK-16753
> URL: https://issues.apache.org/jira/browse/SPARK-16753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1, 1.6.2, 2.0.0, 2.0.1
>Reporter: Jurriaan Pruis
> Attachments: screenshot-1.png
>
>
> I'm having issues with joining a 1 billion row dataframe with skewed data 
> with multiple dataframes with sizes ranging from 100,000 to 10 million rows. 
> This means some of the joins (about half of them) can be done using 
> broadcast, but not all.
> Because the data in the large dataframe is skewed we get out of memory errors 
> in the executors or errors like: 
> `org.apache.spark.shuffle.FetchFailedException: Too large frame`.
> We tried a lot of things, like broadcast joining the skewed rows separately 
> and unioning them with the dataset containing the sort merge joined data. 
> Which works perfectly when doing one or two joins, but when doing 10 joins 
> like this the query planner gets confused (see [SPARK-15326]).
> As most of the rows are skewed on the NULL value we use a hack where we put 
> unique values in those NULL columns so the data is properly distributed over 
> all partitions. This works fine for NULL values, but since this table is 
> growing rapidly and we have skewed data for non-NULL values as well this 
> isn't a full solution to the problem.
> Right now this specific spark task runs well 30% of the time and it's getting 
> worse and worse because of the increasing amount of data.
> How to approach these kinds of joins using Spark? It seems weird that I can't 
> find proper solutions for this problem/other people having the same kind of 
> issues when Spark profiles itself as a large-scale data processing engine. 
> Doing joins on big datasets should be a thing Spark should have no problem 
> with out of the box.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9862) Join: Handling data skew

2017-05-22 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020683#comment-16020683
 ] 

Damian Momot commented on SPARK-9862:
-

Is it really "IN PROGRESS"? It looks like it's stalled

> Join: Handling data skew
> 
>
> Key: SPARK-9862
> URL: https://issues.apache.org/jira/browse/SPARK-9862
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
> Attachments: Handling skew data in join.pdf
>
>
> For a two way shuffle join, if one or multiple groups are skewed in one table 
> (say left table) but having a relative small number of rows in another table 
> (say right table), we can use broadcast join for these skewed groups and use 
> shuffle join for other groups.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-25 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-19293:
-
Affects Version/s: 2.1.1
  Component/s: Spark Core
  Summary: Spark 2.1.x unstable with spark.speculation=true  (was: 
Spark 2.1.0 unstable with spark.speculation=true)

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Damian Momot
>Priority: Critical
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-25 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024684#comment-16024684
 ] 

Damian Momot commented on SPARK-19293:
--

>From what I can see speculative tasks are failing with following exceptions 
>(and without speculation everything works fine):

```
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"XYZ"; destination host is: "XYZ":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:757)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:650)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:708)

[jira] [Comment Edited] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-25 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024684#comment-16024684
 ] 

Damian Momot edited comment on SPARK-19293 at 5/25/17 12:44 PM:


>From what I can see speculative tasks are failing with following exceptions 
>(and without speculation everything works fine):

{code}
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"XYZ"; destination host is: "XYZ":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:757)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:650)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:609)
at 
org.apache.hadoop.ipc.Clie

[jira] [Comment Edited] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-25 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024684#comment-16024684
 ] 

Damian Momot edited comment on SPARK-19293 at 5/25/17 12:45 PM:


>From what I can see speculative tasks are failing with following exceptions 
>(and without speculation everything works fine):

They are all getting FAILED status (which probably should be skipped or 
something like that)

{code}
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"XYZ"; destination host is: "XYZ":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:757)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:650)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
at 
org.apache.hadoo

[jira] [Comment Edited] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-25 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024684#comment-16024684
 ] 

Damian Momot edited comment on SPARK-19293 at 5/25/17 12:50 PM:


>From what I can see speculative tasks are failing with following exceptions 
>(and without speculation everything works fine):

They are all getting FAILED status (which probably should be skipped or 
something like that)

{code}
java.nio.channels.ClosedByInterruptException
at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253)
at 
org.apache.spark.storage.DiskBlockObjectWriter.commitAndGet(DiskBlockObjectWriter.scala:182)
at 
org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:701)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:72)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
{code}

or

{code}
java.io.IOException: Failed on local exception: 
java.nio.channels.ClosedByInterruptException; Host Details : local host is: 
"XYZ"; destination host is: "XYZ":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1470)
at org.apache.hadoop.ipc.Client.call(Client.java:1403)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy23.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:757)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.Uns

[jira] [Commented] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-05-26 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025974#comment-16025974
 ] 

Damian Momot commented on SPARK-19293:
--

Some additional errors for speculative tasks

{code}
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
at 
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:159)
... 28 more
{code}

{code}
java.lang.IllegalArgumentException: requirement failed: cannot write to a 
closed ByteBufferOutputStream
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:40)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
at 
java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1580)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:351)
at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at 
org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253)
at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:142)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Sp

[jira] [Commented] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-06-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057091#comment-16057091
 ] 

Damian Momot commented on SPARK-19293:
--

Yep,

Some tasks are marked as "killed" but some become "failed". In some specific 
cases if number of fails is very big it causes entire spark job to fail. 
Disabling "speculation" solves failing entirely.

It was working flawlessly before Spark 2.1 with "speculation" enabled

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Damian Momot
>Priority: Critical
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-06-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057157#comment-16057157
 ] 

Damian Momot commented on SPARK-19293:
--

I'll try to build from 2.2 branch and test today

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Damian Momot
>Priority: Critical
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-07-13 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-19293:
-
Fix Version/s: 2.2.0

Checked that it's fixed in 2.2.0

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Damian Momot
>Priority: Critical
> Fix For: 2.2.0
>
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true

2017-07-13 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot resolved SPARK-19293.
--
Resolution: Fixed

> Spark 2.1.x unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Damian Momot
>Priority: Critical
> Fix For: 2.2.0
>
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2017-09-06 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154923#comment-16154923
 ] 

Damian Momot commented on SPARK-4502:
-

Are we waiting for anything :) ? By looking at performance tests in 
https://github.com/apache/spark/pull/16578 improvements can be huge

> Spark SQL reads unneccesary nested fields from Parquet
> --
>
> Key: SPARK-4502
> URL: https://issues.apache.org/jira/browse/SPARK-4502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Liwen Sun
>Priority: Critical
>
> When reading a field of a nested column from Parquet, SparkSQL reads and 
> assemble all the fields of that nested column. This is unnecessary, as 
> Parquet supports fine-grained field reads out of a nested column. This may 
> degrades the performance significantly when a nested column has many fields. 
> For example, I loaded json tweets data into SparkSQL and ran the following 
> query:
> {{SELECT User.contributors_enabled from Tweets;}}
> User is a nested structure that has 38 primitive fields (for Tweets schema, 
> see: https://dev.twitter.com/overview/api/tweets), here is the log message:
> {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 
> cell/ms}}
> For comparison, I also ran:
> {{SELECT User FROM Tweets;}}
> And here is the log message:
> {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}}
> So both queries load 38 columns from Parquet, while the first query only 
> needs 1 column. I also measured the bytes read within Parquet. In these two 
> cases, the same number of bytes (99365194 bytes) were read. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19293) Spark 2.1.0 unstable with spark.speculation=true

2017-01-19 Thread Damian Momot (JIRA)
Damian Momot created SPARK-19293:


 Summary: Spark 2.1.0 unstable with spark.speculation=true
 Key: SPARK-19293
 URL: https://issues.apache.org/jira/browse/SPARK-19293
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Damian Momot
Priority: Critical


After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
failing when speculative mode is enabled.

In 2.0.2 speculative tasks were simply skipped if they were not used for result 
(i.e. other instance finished earlier) - and it was clearly visible in UI that 
those tasks were not counted as failures.

In 2.1.0 many tasks are being marked failed/killed when speculative tasks start 
to run (that is at the end of stage when there are spare executors to use) 
which also leads to entire stage/job failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19293) Spark 2.1.0 unstable with spark.speculation=true

2017-01-19 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-19293:
-
Description: 
After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
failing when speculative mode is enabled.

In 2.0.2 speculative tasks were simply skipped if they were not used for result 
(i.e. other instance finished earlier) - and it was clearly visible in UI that 
those tasks were not counted as failures.

In 2.1.0 many tasks are being marked failed/killed when speculative tasks start 
to run (that is at the end of stage when there are spare executors to use) 
which also leads to entire stage/job failures.

Disabling spark.speculation solves failing problem - but speculative mode is 
very useful especially when different executors run on machines with varying 
load (for example in YARN)

  was:
After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
failing when speculative mode is enabled.

In 2.0.2 speculative tasks were simply skipped if they were not used for result 
(i.e. other instance finished earlier) - and it was clearly visible in UI that 
those tasks were not counted as failures.

In 2.1.0 many tasks are being marked failed/killed when speculative tasks start 
to run (that is at the end of stage when there are spare executors to use) 
which also leads to entire stage/job failures.


> Spark 2.1.0 unstable with spark.speculation=true
> 
>
> Key: SPARK-19293
> URL: https://issues.apache.org/jira/browse/SPARK-19293
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Damian Momot
>Priority: Critical
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often 
> failing when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for 
> result (i.e. other instance finished earlier) - and it was clearly visible in 
> UI that those tasks were not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks 
> start to run (that is at the end of stage when there are spare executors to 
> use) which also leads to entire stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is 
> very useful especially when different executors run on machines with varying 
> load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26037) ./dev/make-distribution.sh fails for Scala 2.12

2018-11-13 Thread Damian Momot (JIRA)
Damian Momot created SPARK-26037:


 Summary: ./dev/make-distribution.sh fails for Scala 2.12
 Key: SPARK-26037
 URL: https://issues.apache.org/jira/browse/SPARK-26037
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.0
Reporter: Damian Momot


Trying to create spark distribution for scala 2.12 and Spark 2.4.0

First add cloudera repository:
{code:java}

  cloudera
  https://repository.cloudera.com/artifactory/cloudera-repos/

{code}
Try to make distribution:
{code:java}
./dev/change-scala-version.sh 2.12
./dev/make-distribution.sh --name scala_2.12-cdh5.15.1 --tgz -Phadoop-2.6 
-Dhadoop.version=2.6.0-cdh5.15.1 -DskipTests -Phive -Phive-thriftserver -Pyarn 
-Dscala-2.12{code}
I'm getting multiple failures in Spark Project Sketch - all look the same, 
here's first for reference
{code:java}
~/spark-2.4.0/common/sketch/src/test/scala/org/apache/spark/util/sketch/BitArraySuite.scala:32:
 exception during macro expansion: 
java.lang.ClassNotFoundException: scala.runtime.LazyRef
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.scalactic.MacroOwnerRepair$Utils.repairOwners(MacroOwnerRepair.scala:66)
at org.scalactic.MacroOwnerRepair.repairOwners(MacroOwnerRepair.scala:46)
at org.scalactic.BooleanMacro.genMacro(BooleanMacro.scala:837)
at org.scalatest.AssertionsMacro$.assert(AssertionsMacro.scala:34)
assert(new BitArray(64).bitSize() == 64)
{code}
Same commands for Scala 2.11 work fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26037) ./dev/make-distribution.sh fails for Scala 2.12

2018-11-13 Thread Damian Momot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684982#comment-16684982
 ] 

Damian Momot commented on SPARK-26037:
--

[~srowen] you might be interested :)

> ./dev/make-distribution.sh fails for Scala 2.12
> ---
>
> Key: SPARK-26037
> URL: https://issues.apache.org/jira/browse/SPARK-26037
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Damian Momot
>Priority: Major
>
> Trying to create spark distribution for scala 2.12 and Spark 2.4.0
> First add cloudera repository:
> {code:java}
> 
>   cloudera
>   https://repository.cloudera.com/artifactory/cloudera-repos/
> 
> {code}
> Try to make distribution:
> {code:java}
> ./dev/change-scala-version.sh 2.12
> ./dev/make-distribution.sh --name scala_2.12-cdh5.15.1 --tgz -Phadoop-2.6 
> -Dhadoop.version=2.6.0-cdh5.15.1 -DskipTests -Phive -Phive-thriftserver 
> -Pyarn -Dscala-2.12{code}
> I'm getting multiple failures in Spark Project Sketch - all look the same, 
> here's first for reference
> {code:java}
> ~/spark-2.4.0/common/sketch/src/test/scala/org/apache/spark/util/sketch/BitArraySuite.scala:32:
>  exception during macro expansion: 
> java.lang.ClassNotFoundException: scala.runtime.LazyRef
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.scalactic.MacroOwnerRepair$Utils.repairOwners(MacroOwnerRepair.scala:66)
> at org.scalactic.MacroOwnerRepair.repairOwners(MacroOwnerRepair.scala:46)
> at org.scalactic.BooleanMacro.genMacro(BooleanMacro.scala:837)
> at org.scalatest.AssertionsMacro$.assert(AssertionsMacro.scala:34)
> assert(new BitArray(64).bitSize() == 64)
> {code}
> Same commands for Scala 2.11 work fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26037) ./dev/make-distribution.sh fails for Scala 2.12

2018-11-13 Thread Damian Momot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685010#comment-16685010
 ] 

Damian Momot commented on SPARK-26037:
--

Oh nevermind... it should be "-Pscala-2.12" instead of "-Dscala-2.12"

 
./dev/make-distribution.sh --name scala_2.12-cdh5.15.1 --tgz -Phadoop-2.6 
-Dhadoop.version=2.6.0-cdh5.15.1 -DskipTests -Phive -Phive-thriftserver -Pyarn 
-Pscala-2.12

> ./dev/make-distribution.sh fails for Scala 2.12
> ---
>
> Key: SPARK-26037
> URL: https://issues.apache.org/jira/browse/SPARK-26037
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Damian Momot
>Priority: Major
>
> Trying to create spark distribution for scala 2.12 and Spark 2.4.0
> First add cloudera repository:
> {code:java}
> 
>   cloudera
>   https://repository.cloudera.com/artifactory/cloudera-repos/
> 
> {code}
> Try to make distribution:
> {code:java}
> ./dev/change-scala-version.sh 2.12
> ./dev/make-distribution.sh --name scala_2.12-cdh5.15.1 --tgz -Phadoop-2.6 
> -Dhadoop.version=2.6.0-cdh5.15.1 -DskipTests -Phive -Phive-thriftserver 
> -Pyarn -Dscala-2.12{code}
> I'm getting multiple failures in Spark Project Sketch - all look the same, 
> here's first for reference
> {code:java}
> ~/spark-2.4.0/common/sketch/src/test/scala/org/apache/spark/util/sketch/BitArraySuite.scala:32:
>  exception during macro expansion: 
> java.lang.ClassNotFoundException: scala.runtime.LazyRef
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.scalactic.MacroOwnerRepair$Utils.repairOwners(MacroOwnerRepair.scala:66)
> at org.scalactic.MacroOwnerRepair.repairOwners(MacroOwnerRepair.scala:46)
> at org.scalactic.BooleanMacro.genMacro(BooleanMacro.scala:837)
> at org.scalatest.AssertionsMacro$.assert(AssertionsMacro.scala:34)
> assert(new BitArray(64).bitSize() == 64)
> {code}
> Same commands for Scala 2.11 work fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26037) ./dev/make-distribution.sh fails for Scala 2.12

2018-11-13 Thread Damian Momot (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot resolved SPARK-26037.
--
Resolution: Not A Problem

> ./dev/make-distribution.sh fails for Scala 2.12
> ---
>
> Key: SPARK-26037
> URL: https://issues.apache.org/jira/browse/SPARK-26037
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Damian Momot
>Priority: Major
>
> Trying to create spark distribution for scala 2.12 and Spark 2.4.0
> First add cloudera repository:
> {code:java}
> 
>   cloudera
>   https://repository.cloudera.com/artifactory/cloudera-repos/
> 
> {code}
> Try to make distribution:
> {code:java}
> ./dev/change-scala-version.sh 2.12
> ./dev/make-distribution.sh --name scala_2.12-cdh5.15.1 --tgz -Phadoop-2.6 
> -Dhadoop.version=2.6.0-cdh5.15.1 -DskipTests -Phive -Phive-thriftserver 
> -Pyarn -Dscala-2.12{code}
> I'm getting multiple failures in Spark Project Sketch - all look the same, 
> here's first for reference
> {code:java}
> ~/spark-2.4.0/common/sketch/src/test/scala/org/apache/spark/util/sketch/BitArraySuite.scala:32:
>  exception during macro expansion: 
> java.lang.ClassNotFoundException: scala.runtime.LazyRef
> at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at 
> org.scalactic.MacroOwnerRepair$Utils.repairOwners(MacroOwnerRepair.scala:66)
> at org.scalactic.MacroOwnerRepair.repairOwners(MacroOwnerRepair.scala:46)
> at org.scalactic.BooleanMacro.genMacro(BooleanMacro.scala:837)
> at org.scalatest.AssertionsMacro$.assert(AssertionsMacro.scala:34)
> assert(new BitArray(64).bitSize() == 64)
> {code}
> Same commands for Scala 2.11 work fine



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2017-11-14 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16251354#comment-16251354
 ] 

Damian Momot commented on SPARK-4502:
-

Well this PR is ready: https://github.com/apache/spark/pull/16578 but it still 
awaits acceptance

> Spark SQL reads unneccesary nested fields from Parquet
> --
>
> Key: SPARK-4502
> URL: https://issues.apache.org/jira/browse/SPARK-4502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Liwen Sun
>Priority: Critical
>
> When reading a field of a nested column from Parquet, SparkSQL reads and 
> assemble all the fields of that nested column. This is unnecessary, as 
> Parquet supports fine-grained field reads out of a nested column. This may 
> degrades the performance significantly when a nested column has many fields. 
> For example, I loaded json tweets data into SparkSQL and ran the following 
> query:
> {{SELECT User.contributors_enabled from Tweets;}}
> User is a nested structure that has 38 primitive fields (for Tweets schema, 
> see: https://dev.twitter.com/overview/api/tweets), here is the log message:
> {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 
> cell/ms}}
> For comparison, I also ran:
> {{SELECT User FROM Tweets;}}
> And here is the log message:
> {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}}
> So both queries load 38 columns from Parquet, while the first query only 
> needs 1 column. I also measured the bytes read within Parquet. In these two 
> cases, the same number of bytes (99365194 bytes) were read. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)
Damian Momot created SPARK-22918:


 Summary: sbt test (spark - local) fail after upgrading to 2.2.1 
with: java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
 Key: SPARK-22918
 URL: https://issues.apache.org/jira/browse/SPARK-22918
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Damian Momot


After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.jav

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
   

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
   

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
to fail with following exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHe

[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306074#comment-16306074
 ] 

Damian Momot commented on SPARK-22918:
--

Nope, everything is working fine on Spark 2.2.0 - no security manager related 
configuration at all. Just changing dependency to 2.2.1 causes this problem :(

Also somebody else has similar problem here: 
https://stackoverflow.com/questions/48008343/sbt-test-does-not-work-for-spark-test
 - so I'm not only one :)

Unfortunately I also don't know how to disable security manager with sbt test - 
I tried to do that but couldn't find any information

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContext.cre

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-29 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
to fail with following exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHe

[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306297#comment-16306297
 ] 

Damian Momot commented on SPARK-22918:
--

Spark uses derby for local (test) runs - for metastore etc. Something changed 
for sure between 2.2.0 -> 2.2.1 regarding that usage (simple dependnecy change 
to 2.2.0 removes that problem, getting back to 2.2.1 brings it back again)

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
>   at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfigura

[jira] [Comment Edited] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engi

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306297#comment-16306297
 ] 

Damian Momot edited comment on SPARK-22918 at 12/29/17 2:01 PM:


Spark uses derby for local (test) runs - for metastore etc. Something changed 
for sure between 2.2.0 -> 2.2.1 regarding that usage (simple dependnecy change 
to 2.2.0 removes that problem, getting back to 2.2.1 brings it back again)

i.e. I'm not using derby myself. Spark spawns it as part of spark session with 
hive support


was (Author: daimon):
Spark uses derby for local (test) runs - for metastore etc. Something changed 
for sure between 2.2.0 -> 2.2.1 regarding that usage (simple dependnecy change 
to 2.2.0 removes that problem, getting back to 2.2.1 brings it back again)

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtens

[jira] [Comment Edited] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engi

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306316#comment-16306316
 ] 

Damian Momot edited comment on SPARK-22918 at 12/29/17 2:49 PM:


As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as they're already 2 reports of this problem I've got filling that 
others will be affected


was (Author: daimon):
As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as they're already 2 reports of this problem I've got filling that 
others may be affected

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.Plu

[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306316#comment-16306316
 ] 

Damian Momot commented on SPARK-22918:
--

As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as they're already 2 reports of this problem I've got filling that 
others may be affected

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
>   at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeCo

[jira] [Comment Edited] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engi

2017-12-29 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306316#comment-16306316
 ] 

Damian Momot edited comment on SPARK-22918 at 12/29/17 2:56 PM:


As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as there are already 2 reports of this problem I've got filling that 
others will be affected


was (Author: daimon):
As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as they're already 2 reports of this problem I've got filling that 
others will be affected

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.

[jira] [Comment Edited] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engi

2018-01-01 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306316#comment-16306316
 ] 

Damian Momot edited comment on SPARK-22918 at 1/2/18 7:37 AM:
--

As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as there are already 2 reports of this problem I've got feeling that 
others will be affected


was (Author: daimon):
As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as there are already 2 reports of this problem I've got feelling that 
others will be affected

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.P

[jira] [Comment Edited] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engi

2018-01-01 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306316#comment-16306316
 ] 

Damian Momot edited comment on SPARK-22918 at 1/2/18 7:37 AM:
--

As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as there are already 2 reports of this problem I've got feelling that 
others will be affected


was (Author: daimon):
As a quick workaround following can be used in tests just before instantiating 
spark session:

{code}
System.setSecurityManager(null)
{code}

Anyway as there are already 2 reports of this problem I've got filling that 
others will be affected

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.P

[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2018-08-27 Thread Damian Momot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16594546#comment-16594546
 ] 

Damian Momot commented on SPARK-4502:
-

I can see that this ticket was closed, but by looking at 
[https://github.com/apache/spark/pull/21320] only very basic scenario is 
supported and feature itself is disabled by default

 

Are there any follow up tickets to track full feature implementation?

> Spark SQL reads unneccesary nested fields from Parquet
> --
>
> Key: SPARK-4502
> URL: https://issues.apache.org/jira/browse/SPARK-4502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Liwen Sun
>Assignee: Michael Allman
>Priority: Critical
> Fix For: 2.4.0
>
>
> When reading a field of a nested column from Parquet, SparkSQL reads and 
> assemble all the fields of that nested column. This is unnecessary, as 
> Parquet supports fine-grained field reads out of a nested column. This may 
> degrades the performance significantly when a nested column has many fields. 
> For example, I loaded json tweets data into SparkSQL and ran the following 
> query:
> {{SELECT User.contributors_enabled from Tweets;}}
> User is a nested structure that has 38 primitive fields (for Tweets schema, 
> see: https://dev.twitter.com/overview/api/tweets), here is the log message:
> {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 
> cell/ms}}
> For comparison, I also ran:
> {{SELECT User FROM Tweets;}}
> And here is the log message:
> {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}}
> So both queries load 38 columns from Parquet, while the first query only 
> needs 1 column. I also measured the bytes read within Parquet. In these two 
> cases, the same number of bytes (99365194 bytes) were read. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4502) Spark SQL reads unneccesary nested fields from Parquet

2018-03-01 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381752#comment-16381752
 ] 

Damian Momot commented on SPARK-4502:
-

[~sameerag] any chance to put higher priority on this now, as 2.3.0 is 
released? It has huge performance improvement potential for nested parquet data 
reads

Original PR was created by [~michael] more than year ago (2017-01-13) and still 
hasn't been fully reviewed

> Spark SQL reads unneccesary nested fields from Parquet
> --
>
> Key: SPARK-4502
> URL: https://issues.apache.org/jira/browse/SPARK-4502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Liwen Sun
>Priority: Critical
>
> When reading a field of a nested column from Parquet, SparkSQL reads and 
> assemble all the fields of that nested column. This is unnecessary, as 
> Parquet supports fine-grained field reads out of a nested column. This may 
> degrades the performance significantly when a nested column has many fields. 
> For example, I loaded json tweets data into SparkSQL and ran the following 
> query:
> {{SELECT User.contributors_enabled from Tweets;}}
> User is a nested structure that has 38 primitive fields (for Tweets schema, 
> see: https://dev.twitter.com/overview/api/tweets), here is the log message:
> {{14/11/19 16:36:49 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 3976 ms: 97.02691 rec/ms, 3687.0227 
> cell/ms}}
> For comparison, I also ran:
> {{SELECT User FROM Tweets;}}
> And here is the log message:
> {{14/11/19 16:45:40 INFO InternalParquetRecordReader: Assembled and processed 
> 385779 records from 38 columns in 9461 ms: 40.77571 rec/ms, 1549.477 cell/ms}}
> So both queries load 38 columns from Parquet, while the first query only 
> needs 1 column. I also measured the bytes read within Parquet. In these two 
> cases, the same number of bytes (99365194 bytes) were read. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory

2016-03-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204064#comment-15204064
 ] 

Damian Momot commented on SPARK-11293:
--

It looks we are getting same errors on 1.6.0 too

> Spillable collections leak shuffle memory
> -
>
> Key: SPARK-11293
> URL: https://issues.apache.org/jira/browse/SPARK-11293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
>
> I discovered multiple leaks of shuffle memory while working on my memory 
> manager consolidation patch, which added the ability to do strict memory leak 
> detection for the bookkeeping that used to be performed by the 
> ShuffleMemoryManager. This uncovered a handful of places where tasks can 
> acquire execution/shuffle memory but never release it, starving themselves of 
> memory.
> Problems that I found:
> * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution 
> memory.
> * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a 
> {{CompletionIterator}}.
> * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing 
> its resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-11293) Spillable collections leak shuffle memory

2016-03-21 Thread Damian Momot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15204064#comment-15204064
 ] 

Damian Momot edited comment on SPARK-11293 at 3/21/16 11:36 AM:


It looks we are getting same errors on 1.6.0 too - any plans to address it?


was (Author: daimon):
It looks we are getting same errors on 1.6.0 too

> Spillable collections leak shuffle memory
> -
>
> Key: SPARK-11293
> URL: https://issues.apache.org/jira/browse/SPARK-11293
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
>
> I discovered multiple leaks of shuffle memory while working on my memory 
> manager consolidation patch, which added the ability to do strict memory leak 
> detection for the bookkeeping that used to be performed by the 
> ShuffleMemoryManager. This uncovered a handful of places where tasks can 
> acquire execution/shuffle memory but never release it, starving themselves of 
> memory.
> Problems that I found:
> * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution 
> memory.
> * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a 
> {{CompletionIterator}}.
> * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing 
> its resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org