[jira] [Commented] (SPARK-33106) Fix sbt resolvers clash

2021-01-16 Thread Alexander Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17266706#comment-17266706
 ] 

Alexander Bessonov commented on SPARK-33106:


My bad. I had the following in my environment that caused the issue. Builds 
fine without it.
{code:java}
SBT_OPTS="-Dsbt.override.build.repos=true"{code}
 

> Fix sbt resolvers clash
> ---
>
> Key: SPARK-33106
> URL: https://issues.apache.org/jira/browse/SPARK-33106
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Denis Pyshev
>Assignee: Denis Pyshev
>Priority: Minor
> Fix For: 3.1.0
>
>
> During sbt upgrade from 0.13 to 1.x, exact resolvers list was used as is.
> That leads to local resolvers name clashing, which is observed as warning 
> from SBT:
> {code:java}
> [warn] Multiple resolvers having different access mechanism configured with 
> same name 'local'. To avoid conflict, Remove duplicate project resolvers 
> (`resolvers`) or rename publishing resolve
> r (`publishTo`).
> {code}
> This needs to be fixed to avoid potential errors and reduce log noise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33106) Fix sbt resolvers clash

2021-01-07 Thread Alexander Bessonov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17260903#comment-17260903
 ] 

Alexander Bessonov commented on SPARK-33106:


That doesn't seem to fix the issue:
{code:java}build/sbt publishLocal{code} now seems to end with an error 
"Undefined resolver 'ivyLocal'".

> Fix sbt resolvers clash
> ---
>
> Key: SPARK-33106
> URL: https://issues.apache.org/jira/browse/SPARK-33106
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Denis Pyshev
>Assignee: Denis Pyshev
>Priority: Minor
> Fix For: 3.1.0
>
>
> During sbt upgrade from 0.13 to 1.x, exact resolvers list was used as is.
> That leads to local resolvers name clashing, which is observed as warning 
> from SBT:
> {code:java}
> [warn] Multiple resolvers having different access mechanism configured with 
> same name 'local'. To avoid conflict, Remove duplicate project resolvers 
> (`resolvers`) or rename publishing resolve
> r (`publishTo`).
> {code}
> This needs to be fixed to avoid potential errors and reduce log noise.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29719) Converted Metastore relations (ORC, Parquet) wouldn't update InMemoryFileIndex

2019-11-01 Thread Alexander Bessonov (Jira)
Alexander Bessonov created SPARK-29719:
--

 Summary: Converted Metastore relations (ORC, Parquet) wouldn't 
update InMemoryFileIndex
 Key: SPARK-29719
 URL: https://issues.apache.org/jira/browse/SPARK-29719
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Alexander Bessonov


Spark attempts to convert Hive tables backed by Parquet and ORC into an 
internal logical relationships which cache file locations for underlying data. 
That cache wouldn't be invalidated when attempting to re-read partitioned table 
later on. The table might have new files by the time it is re-read which might 
be ignored.

 

 
{code:java}
val spark = SparkSession.builder()
.master("yarn")
.enableHiveSupport
.config("spark.sql.hive.caseSensitiveInferenceMode", "NEVER_INFER")
.getOrCreate()

val df1 = spark.table("my_table").filter("date=20191101")
// Do something with `df1`
// External process writes to the partition
val df2 = spark.table("my_table").filter("date=20191101")
// Do something with `df2`. Data in `df1` and `df2` should be different, but is 
equal.{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29330) Allow users to chose the name of Spark Shuffle service

2019-10-02 Thread Alexander Bessonov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bessonov updated SPARK-29330:
---
Description: 
As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
Shuffle Service.

HDP distribution of Spark, on the other hand, uses 
[{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117].
 This is done to be able to run both Spark 1.6 and Spark 2.x on the same Hadoop 
cluster.

Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
favor) running becomes impossible due to the shuffle service name mismatch.

  was:
As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
Shuffle Service.

HDP distribution of Spark, on the other hand, uses [{{spark2_shuffle}}|#L117]]. 
This is done to be able to run both Spark 1.6 and Spark 2.x on the same Hadoop 
cluster.

Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
favor) running becomes impossible due to the shuffle service name mismatch.


> Allow users to chose the name of Spark Shuffle service
> --
>
> Key: SPARK-29330
> URL: https://issues.apache.org/jira/browse/SPARK-29330
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 2.4.4
>Reporter: Alexander Bessonov
>Priority: Minor
>
> As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
> Shuffle Service.
> HDP distribution of Spark, on the other hand, uses 
> [{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117].
>  This is done to be able to run both Spark 1.6 and Spark 2.x on the same 
> Hadoop cluster.
> Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
> favor) running becomes impossible due to the shuffle service name mismatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29330) Allow users to chose the name of Spark Shuffle service

2019-10-02 Thread Alexander Bessonov (Jira)
Alexander Bessonov created SPARK-29330:
--

 Summary: Allow users to chose the name of Spark Shuffle service
 Key: SPARK-29330
 URL: https://issues.apache.org/jira/browse/SPARK-29330
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, YARN
Affects Versions: 2.4.4
Reporter: Alexander Bessonov


As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
Shuffle Service.

HDP distribution of Spark, on the other hand, uses [{{spark2_shuffle}}|#L117]]. 
This is done to be able to run both Spark 1.6 and Spark 2.x on the same Hadoop 
cluster.

Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
favor) running becomes impossible due to the shuffle service name mismatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25983) spark-sql-kafka-0-10 no longer works with Kafka 0.10.0

2018-11-08 Thread Alexander Bessonov (JIRA)
Alexander Bessonov created SPARK-25983:
--

 Summary: spark-sql-kafka-0-10 no longer works with Kafka 0.10.0
 Key: SPARK-25983
 URL: https://issues.apache.org/jira/browse/SPARK-25983
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Alexander Bessonov


Package {{org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0}} is no longer 
compatible with {{org.apache.kafka:kafka_2.11:0.10.0.1}}.

When both packages are used in the same project, the following exception occurs:
{code:java}
java.lang.NoClassDefFoundError: 
org/apache/kafka/common/protocol/SecurityProtocol
 at kafka.server.Defaults$.(KafkaConfig.scala:125)
 at kafka.server.Defaults$.(KafkaConfig.scala)
 at kafka.log.Defaults$.(LogConfig.scala:33)
 at kafka.log.Defaults$.(LogConfig.scala)
 at kafka.log.LogConfig$.(LogConfig.scala:152)
 at kafka.log.LogConfig$.(LogConfig.scala)
 at kafka.server.KafkaConfig$.(KafkaConfig.scala:265)
 at kafka.server.KafkaConfig$.(KafkaConfig.scala)
 at kafka.server.KafkaConfig.(KafkaConfig.scala:759)
 at kafka.server.KafkaConfig.(KafkaConfig.scala:761)
{code}
 

This exception is caused by incompatible dependency pulled by Spark: 
{{org.apache.kafka:kafka-clients_2.11:2.0.0}}.
 

Following workaround could be used to resolve the problem in my project:
{code:java}
dependencyOverrides += "org.apache.kafka" % "kafka-clients" % "0.10.0.1"
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23737) Scala API documentation leads to nonexistent pages for sources

2018-03-26 Thread Alexander Bessonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414111#comment-16414111
 ] 

Alexander Bessonov commented on SPARK-23737:


[~sameerag], Making a wild guess the username in the URL is yours.

> Scala API documentation leads to nonexistent pages for sources
> --
>
> Key: SPARK-23737
> URL: https://issues.apache.org/jira/browse/SPARK-23737
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Alexander Bessonov
>Priority: Minor
>
> h3. Steps to reproduce:
>  # Go to [Scala API 
> homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package]].
>  # Click "Source: package.scala"
> h3. Result:
> The link leads to nonexistent page: 
> [https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
> h3. Expected result:
> The link leads to proper page:
> [https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23737) Scala API documentation leads to nonexistent pages for sources

2018-03-26 Thread Alexander Bessonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bessonov reopened SPARK-23737:


Okay. The bug isn't fixed and it affects everyone who wants to jump to the 
source code from ScalaDocs.

> Scala API documentation leads to nonexistent pages for sources
> --
>
> Key: SPARK-23737
> URL: https://issues.apache.org/jira/browse/SPARK-23737
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Alexander Bessonov
>Priority: Minor
>
> h3. Steps to reproduce:
>  # Go to [Scala API 
> homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package]].
>  # Click "Source: package.scala"
> h3. Result:
> The link leads to nonexistent page: 
> [https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
> h3. Expected result:
> The link leads to proper page:
> [https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23737) Scala API documentation leads to nonexistent pages for sources

2018-03-20 Thread Alexander Bessonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16406591#comment-16406591
 ] 

Alexander Bessonov commented on SPARK-23737:


Oh, thanks. Linked them.

> Scala API documentation leads to nonexistent pages for sources
> --
>
> Key: SPARK-23737
> URL: https://issues.apache.org/jira/browse/SPARK-23737
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Alexander Bessonov
>Priority: Minor
>
> h3. Steps to reproduce:
>  # Go to [Scala API 
> homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package]].
>  # Click "Source: package.scala"
> h3. Result:
> The link leads to nonexistent page: 
> [https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
> h3. Expected result:
> The link leads to proper page:
> [https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23737) Scala API documentation leads to nonexistent pages for sources

2018-03-19 Thread Alexander Bessonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bessonov updated SPARK-23737:
---
Description: 
h3. Steps to reproduce:
 # Go to [Scala API 
homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package]].
 # Click "Source: package.scala"

h3. Result:

The link leads to nonexistent page: 
[https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
h3. Expected result:

The link leads to proper page:

[https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]

 

  was:
h3. Steps to reproduce:
 # Go to [Scala API 
homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package].]
 # Click "Source: package.scala"

h3. Result:

The link leads to nonexistent page: 
[https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
h3. Expected result:

The link leads to proper page:

[https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]

 


> Scala API documentation leads to nonexistent pages for sources
> --
>
> Key: SPARK-23737
> URL: https://issues.apache.org/jira/browse/SPARK-23737
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.3.0
>Reporter: Alexander Bessonov
>Priority: Minor
>
> h3. Steps to reproduce:
>  # Go to [Scala API 
> homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package]].
>  # Click "Source: package.scala"
> h3. Result:
> The link leads to nonexistent page: 
> [https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
> h3. Expected result:
> The link leads to proper page:
> [https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23737) Scala API documentation leads to nonexistent pages for sources

2018-03-19 Thread Alexander Bessonov (JIRA)
Alexander Bessonov created SPARK-23737:
--

 Summary: Scala API documentation leads to nonexistent pages for 
sources
 Key: SPARK-23737
 URL: https://issues.apache.org/jira/browse/SPARK-23737
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.3.0
Reporter: Alexander Bessonov


h3. Steps to reproduce:
 # Go to [Scala API 
homepage|[http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package].]
 # Click "Source: package.scala"

h3. Result:

The link leads to nonexistent page: 
[https://github.com/apache/spark/tree/v2.3.0/Users/sameera/dev/spark/core/src/main/scala/org/apache/spark/package.scala]
h3. Expected result:

The link leads to proper page:

[https://github.com/apache/spark/tree/v2.3.0/core/src/main/scala/org/apache/spark/package.scala]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17414) Set type is not supported for creating data frames

2017-08-17 Thread Alexander Bessonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130852#comment-16130852
 ] 

Alexander Bessonov commented on SPARK-17414:


Fixed in SPARK-21204

> Set type is not supported for creating data frames
> --
>
> Key: SPARK-17414
> URL: https://issues.apache.org/jira/browse/SPARK-17414
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Emre Colak
>Priority: Minor
>
> For a case class that has a field of type Set, createDataFrame() method 
> throws an exception saying "Schema for type Set is not supported". Exception 
> is raised by the org.apache.spark.sql.catalyst.ScalaReflection class where 
> Array, Seq and Map types are supported but Set is not. It would be nice to 
> support Set here by default instead of having to write a custom Encoder.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21696) State Store can't handle corrupted snapshots

2017-08-10 Thread Alexander Bessonov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122078#comment-16122078
 ] 

Alexander Bessonov commented on SPARK-21696:


{{HDFSBackedStateStoreProvider.doMaintenance()}} will supress any {{NonFatal}} 
exceptions. {{startMaintenanceIfNeeded.startMaintenanceIfNeeded()}} wouldn't 
restart maintenance if crashed. State Store still can function even when 
snapshot file is corrupted by simply falling back to deltas.

> State Store can't handle corrupted snapshots
> 
>
> Key: SPARK-21696
> URL: https://issues.apache.org/jira/browse/SPARK-21696
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.2.0
>Reporter: Alexander Bessonov
>Priority: Critical
>
> State store's asynchronous maintenance task (generation of Snapshot files) is 
> not rescheduled if crashed which might lead to corrupted snapshots.
> In our case, on multiple occasions, executors died during maintenance task 
> with Out Of Memory error which led to following error on recovery:
> {code:none}
> 17/08/07 20:12:24 WARN TaskSetManager: Lost task 3.1 in stage 102.0 (TID 
> 3314, dnj2-bach-r2n10.bloomberg.com, executor 94): java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:436)
> at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:314)
> at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:313)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:313)
> at 
> org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:220)
> at 
> org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:186)
> at 
> org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
> at org.apache.spark.scheduler.Task.run(Task.scala:99)
> at 
> 

[jira] [Updated] (SPARK-21696) State Store can't handle corrupted snapshots

2017-08-10 Thread Alexander Bessonov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Bessonov updated SPARK-21696:
---
Description: 
State store's asynchronous maintenance task (generation of Snapshot files) is 
not rescheduled if crashed which might lead to corrupted snapshots.

In our case, on multiple occasions, executors died during maintenance task with 
Out Of Memory error which led to following error on recovery:
{code:none}
17/08/07 20:12:24 WARN TaskSetManager: Lost task 3.1 in stage 102.0 (TID 3314, 
dnj2-bach-r2n10.bloomberg.com, executor 94): java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:436)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:314)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:313)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:313)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:220)
at 
org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:186)
at 
org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

  was:
State store's asynchronous maintenance task (generation of Snapshot files) is 
not rescheduled if crashed which might lead to corrupted snapshots.

In our case, on multiple occasions, executors died during maintenance task with 
Out Of Memory error which led to following error on recovery:
{code:text}
17/08/07 20:12:24 WARN TaskSetManager: Lost task 3.1 in stage 102.0 (TID 3314, 
dnj2-bach-r2n10.bloomberg.com, executor 94): java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 

[jira] [Created] (SPARK-21696) State Store can't handle corrupted snapshots

2017-08-10 Thread Alexander Bessonov (JIRA)
Alexander Bessonov created SPARK-21696:
--

 Summary: State Store can't handle corrupted snapshots
 Key: SPARK-21696
 URL: https://issues.apache.org/jira/browse/SPARK-21696
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.2.0, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0
Reporter: Alexander Bessonov
Priority: Critical


State store's asynchronous maintenance task (generation of Snapshot files) is 
not rescheduled if crashed which might lead to corrupted snapshots.

In our case, on multiple occasions, executors died during maintenance task with 
Out Of Memory error which led to following error on recovery:
{code:text}
17/08/07 20:12:24 WARN TaskSetManager: Lost task 3.1 in stage 102.0 (TID 3314, 
dnj2-bach-r2n10.bloomberg.com, executor 94): java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$readSnapshotFile(HDFSBackedStateStoreProvider.scala:436)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:314)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$$anonfun$org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap$1.apply(HDFSBackedStateStoreProvider.scala:313)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$loadMap(HDFSBackedStateStoreProvider.scala:313)
at 
org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.getStore(HDFSBackedStateStoreProvider.scala:220)
at 
org.apache.spark.sql.execution.streaming.state.StateStore$.get(StateStore.scala:186)
at 
org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20900) ApplicationMaster crashes if SPARK_YARN_STAGING_DIR is not set

2017-05-26 Thread Alexander Bessonov (JIRA)
Alexander Bessonov created SPARK-20900:
--

 Summary: ApplicationMaster crashes if SPARK_YARN_STAGING_DIR is 
not set
 Key: SPARK-20900
 URL: https://issues.apache.org/jira/browse/SPARK-20900
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 2.1.0, 1.6.0, 1.2.0
 Environment: Spark 2.1.0
Reporter: Alexander Bessonov
Priority: Minor


When running {{ApplicationMaster}} directly, if {{SPARK_YARN_STAGING_DIR}} is 
not set or set to empty string, {{org.apache.hadoop.fs.Path}} will throw 
{{IllegalArgumentException}} instead of returning {{null}}. This is not handled 
and the exception crashes the job.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org