[jira] [Created] (SPARK-22047) HiveExternalCatalogVersionsSuite is Flaky on Jenkins

2017-09-18 Thread Armin Braun (JIRA)
Armin Braun created SPARK-22047:
---

 Summary: HiveExternalCatalogVersionsSuite is Flaky on Jenkins
 Key: SPARK-22047
 URL: https://issues.apache.org/jira/browse/SPARK-22047
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Armin Braun


HiveExternalCatalogVersionsSuite fails quite a bit lately e.g.

https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-sbt-hadoop-2.7/3490/testReport/junit/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/

{code}
Error Message

org.scalatest.exceptions.TestFailedException: spark-submit returned with exit 
code 1. Command line: './bin/spark-submit' '--name' 'prepare testing tables' 
'--master' 'local[2]' '--conf' 'spark.ui.enabled=false' '--conf' 
'spark.master.rest.enabled=false' '--conf' 
'spark.sql.warehouse.dir=/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/warehouse-b266cb0e-5180-4ba8-80a3-b790b3be3aa0'
 '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' 
'-Dderby.system.home=/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/warehouse-b266cb0e-5180-4ba8-80a3-b790b3be3aa0'
 
'/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/test120059455549609580.py'
  2017-09-17 04:26:11.641 - stderr> Error: Could not find or load main class 
org.apache.spark.launcher.Main
Stacktrace

sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
spark-submit returned with exit code 1.
Command line: './bin/spark-submit' '--name' 'prepare testing tables' '--master' 
'local[2]' '--conf' 'spark.ui.enabled=false' '--conf' 
'spark.master.rest.enabled=false' '--conf' 
'spark.sql.warehouse.dir=/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/warehouse-b266cb0e-5180-4ba8-80a3-b790b3be3aa0'
 '--conf' 'spark.sql.test.version.index=0' '--driver-java-options' 
'-Dderby.system.home=/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/warehouse-b266cb0e-5180-4ba8-80a3-b790b3be3aa0'
 
'/home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/target/tmp/test120059455549609580.py'

2017-09-17 04:26:11.641 - stderr> Error: Could not find or load main class 
org.apache.spark.launcher.Main
   
at 
org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
at 
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at org.scalatest.Assertions$class.fail(Assertions.scala:1089)
at org.scalatest.FunSuite.fail(FunSuite.scala:1560)
at 
org.apache.spark.sql.hive.SparkSubmitTestUtils$class.runSparkSubmit(SparkSubmitTestUtils.scala:81)
at 
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.runSparkSubmit(HiveExternalCatalogVersionsSuite.scala:38)
at 
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:120)
at 
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite$$anonfun$beforeAll$1.apply(HiveExternalCatalogVersionsSuite.scala:105)
at scala.collection.immutable.List.foreach(List.scala:381)
at 
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite.beforeAll(HiveExternalCatalogVersionsSuite.scala:105)
at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:212)
at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
at sbt.ForkMain$Run$2.call(ForkMain.java:296)
at sbt.ForkMain$Run$2.call(ForkMain.java:286)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21970) Do a Project Wide Sweep for Redundant Throws Declarations

2017-09-10 Thread Armin Braun (JIRA)
Armin Braun created SPARK-21970:
---

 Summary: Do a Project Wide Sweep for Redundant Throws Declarations
 Key: SPARK-21970
 URL: https://issues.apache.org/jira/browse/SPARK-21970
 Project: Spark
  Issue Type: Bug
  Components: Examples, Spark Core, SQL
Affects Versions: 2.3.0
Reporter: Armin Braun
Priority: Trivial


Unfortunately, redundant throws declarations are not caught by Checkstyle and 
there are quite a few in the current Java codebase.
In one case `ShuffleExternalSorter#closeAndGetSpills` this hides some dead code 
too.

I think it's worthwhile to do a sweep for these and remove them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21967) org.apache.spark.unsafe.types.UTF8String#compareTo Should Compare 8 Bytes at a Time for Better Performance

2017-09-10 Thread Armin Braun (JIRA)
Armin Braun created SPARK-21967:
---

 Summary: org.apache.spark.unsafe.types.UTF8String#compareTo Should 
Compare 8 Bytes at a Time for Better Performance
 Key: SPARK-21967
 URL: https://issues.apache.org/jira/browse/SPARK-21967
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Armin Braun
Priority: Minor


org.apache.spark.unsafe.types.UTF8String#compareTo contains the following TODO:

{code}
int len = Math.min(numBytes, other.numBytes);
// TODO: compare 8 bytes as unsigned long
for (int i = 0; i < len; i ++) {
  // In UTF-8, the byte should be unsigned, so we should compare them as 
unsigned int.
{code}

The todo should be resolved by comparing the maximum number of 64bit words 
possible in this method, before falling back to unsigned int comparison.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20201) Flaky Test: org.apache.spark.sql.catalyst.expressions.OrderingSuite

2017-09-06 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154975#comment-16154975
 ] 

Armin Braun commented on SPARK-20201:
-

This was resolved by this commit in June 
https://github.com/original-brownbear/spark/commit/b32b2123ddca66e00acf4c9d956232e07f779f9f#diff-4fe0e85423909b24c2a56287468271f1R138

> Flaky Test: org.apache.spark.sql.catalyst.expressions.OrderingSuite
> ---
>
> Key: SPARK-20201
> URL: https://issues.apache.org/jira/browse/SPARK-20201
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Takuya Ueshin
>Priority: Minor
>  Labels: flaky-test
>
> This test failed recently here:
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7/2856/testReport/junit/org.apache.spark.sql.catalyst.expressions/OrderingSuite/SPARK_16845__GeneratedClass$SpecificOrdering_grows_beyond_64_KB/
> Dashboard
> https://spark-tests.appspot.com/test-details?suite_name=org.apache.spark.sql.catalyst.expressions.OrderingSuite&test_name=SPARK-16845%3A+GeneratedClass%24SpecificOrdering+grows+beyond+64+KB
> Error Message
> {code}
> java.lang.StackOverflowError
> {code}
> {code}
> com.google.common.util.concurrent.ExecutionError: java.lang.StackOverflowError
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:903)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:188)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.create(GenerateOrdering.scala:43)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:887)
>   at 
> org.apache.spark.sql.catalyst.expressions.OrderingSuite$$anonfun$1.apply$mcV$sp(OrderingSuite.scala:138)
>   at 
> org.apache.spark.sql.catalyst.expressions.OrderingSuite$$anonfun$1.apply(OrderingSuite.scala:131)
>   at 
> org.apache.spark.sql.catalyst.expressions.OrderingSuite$$anonfun$1.apply(OrderingSuite.scala:131)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at org.scalatest.FunSuite.runTest(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(Spa

[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-25 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15982669#comment-15982669
 ] 

Armin Braun commented on SPARK-20336:
-

[~priancho] my bad apparently in the above. I can't retrace the exact version I 
ran on (maybe I mistakenly ran an old revision, sorry about that).
But I see the same with `master` revision `31345fde82` from today.

{code}
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
17/04/25 12:14:55 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/04/25 12:14:57 WARN yarn.Client: Neither spark.yarn.jars nor 
spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Spark context Web UI available at http://192.168.178.57:4040
Spark context available as 'sc' (master = yarn, app id = 
application_1493115274587_0001).
Spark session available as 'spark'.
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.0-SNAPSHOT
  /_/
 
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> spark.read.option("wholeFile", true).option("header", 
true).csv("file:///tmp/sample.csv").show()
+++-+
|col1|col2| col3|
+++-+
|   1|   a| text|
|   2|   b| テキスト|
|   3|   c|  텍스트|
|   4|   d|text
テキスト
텍스트|
|   5|   e| last|
+++-+



{code}



> spark.read.csv() with wholeFile=True option fails to read non ASCII unicode 
> characters
> --
>
> Key: SPARK-20336
> URL: https://issues.apache.org/jira/browse/SPARK-20336
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2.0 (master branch is downloaded from Github)
> PySpark
>Reporter: HanCheol Cho
>
> I used spark.read.csv() method with wholeFile=True option to load data that 
> has multi-line records.
> However, non-ASCII characters are not properly loaded.
> The following is a sample data for test:
> {code:none}
> col1,col2,col3
> 1,a,text
> 2,b,テキスト
> 3,c,텍스트
> 4,d,"text
> テキスト
> 텍스트"
> 5,e,last
> {code}
> When it is loaded without wholeFile=True option, non-ASCII characters are 
> shown correctly although multi-line records are parsed incorrectly as follows:
> {code:none}
> testdf_default = spark.read.csv("test.encoding.csv", header=True)
> testdf_default.show()
> ++++
> |col1|col2|col3|
> ++++
> |   1|   a|text|
> |   2|   b|テキスト|
> |   3|   c| 텍스트|
> |   4|   d|text|
> |テキスト|null|null|
> | 텍스트"|null|null|
> |   5|   e|last|
> ++++
> {code}
> When wholeFile=True option is used, non-ASCII characters are broken as 
> follows:
> {code:none}
> testdf_wholefile = spark.read.csv("test.encoding.csv", header=True, 
> wholeFile=True)
> testdf_wholefile.show()
> ++++
> |col1|col2|col3|
> ++++
> |   1|   a|text|
> |   2|   b||
> |   3|   c|   �|
> |   4|   d|text
> ...|
> |   5|   e|last|
> ++++
> {code}
> The result is same even if I use encoding="UTF-8" option with wholeFile=True.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20455) Missing Test Target in Documentation for "Running Docker-based Integration Test Suites"

2017-04-24 Thread Armin Braun (JIRA)
Armin Braun created SPARK-20455:
---

 Summary: Missing Test Target in Documentation for "Running 
Docker-based Integration Test Suites"
 Key: SPARK-20455
 URL: https://issues.apache.org/jira/browse/SPARK-20455
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.1.0
Reporter: Armin Braun
Priority: Minor


The doc at 
http://spark.apache.org/docs/latest/building-spark.html#running-docker-based-integration-test-suites
 is missing the `test` goal in the second line of the Maven build description.

It should be:

{code}
./build/mvn install -DskipTests
./build/mvn test -Pdocker-integration-tests -pl 
:spark-docker-integration-tests_2.11
{code}

Adding a PR now.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20436) NullPointerException when restart from checkpoint file

2017-04-24 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-20436:

Description: 
I have written a Spark Streaming application which have two DStreams.
Code is :
{code}
object KafkaTwoInkfk {
  def main(args: Array[String]) {
val Array(checkPointDir, brokers, topic1, topic2, batchSize) = args
val ssc = StreamingContext.getOrCreate(checkPointDir, () => 
createContext(args))

ssc.start()
ssc.awaitTermination()
  }

  def createContext(args : Array[String]) : StreamingContext = {
val Array(checkPointDir, brokers, topic1, topic2, batchSize) = args
val sparkConf = new SparkConf().setAppName("KafkaWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(batchSize.toLong))

ssc.checkpoint(checkPointDir)
val topicArr1 = topic1.split(",")
val topicSet1 = topicArr1.toSet
val topicArr2 = topic2.split(",")
val topicSet2 = topicArr2.toSet

val kafkaParams = Map[String, String](
  "metadata.broker.list" -> brokers
)

val lines1 = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topicSet1)
val words1 = lines1.map(_._2).flatMap(_.split(" "))
val wordCounts1 = words1.map(x => {
  (x, 1L)}).reduceByKey(_ + _)
wordCounts1.print()

val lines2 = KafkaUtils.createDirectStream[String, String, StringDecoder, 
StringDecoder](ssc, kafkaParams, topicSet2)
val words2 = lines1.map(_._2).flatMap(_.split(" "))
val wordCounts2 = words2.map(x => {
  (x, 1L)}).reduceByKey(_ + _)
wordCounts2.print()

return ssc
  }
}
{code}
when  restart from checkpoint file, it throw NullPointerException:
java.lang.NullPointerException
at 
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$writeObject$1.apply$mcV$sp(DStreamCheckpointData.scala:126)
at 
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$writeObject$1.apply(DStreamCheckpointData.scala:124)
at 
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$writeObject$1.apply(DStreamCheckpointData.scala:124)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1291)
at 
org.apache.spark.streaming.dstream.DStreamCheckpointData.writeObject(DStreamCheckpointData.scala:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$writeObject$1.apply$mcV$sp(DStream.scala:528)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$writeObject$1.apply(DStream.scala:523)
at 
org.apache.spark.streaming.dstream.DStream$$anonfun$writeObject$1.apply(DStream.scala:523)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1291)
at 
org.apache.spark.streaming.dstream.DStream.writeObject(DStream.scala:523)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at 
java.io.ObjectOutputStream.defaultWriteO

[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981166#comment-15981166
 ] 

Armin Braun commented on SPARK-20155:
-

[~RPCMoritz] take a look at what I just found: 
https://issues.apache.org/jira/browse/SPARK-19834?focusedCommentId=15925375&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15925375
  :)
It's on the radar apparently

> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> ---
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.0.0
>Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>appearing inside a field must be escaped by preceding it with
>another double quote.  For example:
>"aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun resolved SPARK-20155.
-
Resolution: Won't Fix

> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> ---
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.0.0
>Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>appearing inside a field must be escaped by preceding it with
>another double quote.  For example:
>"aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981009#comment-15981009
 ] 

Armin Braun commented on SPARK-20155:
-

[~RPCMoritz] sorry I was under the wrong assumption that quote escaping was 
enabled by default:

see the difference with your example:

{code}
scala> spark.read.csv("file:///tmp/tmp2.tmp").show()
+---+-+---+---+
|_c0|  _c1|_c2|_c3|
+---+-+---+---+
|aaa|"b""b| b"|ccc|
+---+-+---+---+


scala> spark.read.option("escape", "\"").csv("file:///tmp/tmp2.tmp").show()
+---+-+---+
|_c0|  _c1|_c2|
+---+-+---+
|aaa|b"b,b|ccc|
+---+-+---+
{code}

I think this can be closed, I don't think changing the default behavior is an 
option here.


> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> ---
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.0.0
>Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>appearing inside a field must be escaped by preceding it with
>another double quote.  For example:
>"aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-24 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980974#comment-15980974
 ] 

Armin Braun commented on SPARK-20155:
-

I was able to reproduce this:

{code}
"aaa","b\"b,b","ccc"
{code}

gives us

{code}
scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show()
+---+-+---+
|_c0|  _c1|_c2|
+---+-+---+
|aaa|b"b,b|ccc|
+---+-+---+

{code}

while

{code}
"aaa","b""b,b","ccc"
{code}

gives us:

{code}
scala> spark.read.option("wholeFile", true).csv("file:///tmp/tmp2.csv").show()
+---+-+---+---+
|_c0|  _c1|_c2|_c3|
+---+-+---+---+
|aaa|"b""b| b"|ccc|

{code}

Will try to fix :)

> CSV-files with quoted quotes can't be parsed, if delimiter follows quoted 
> quote
> ---
>
> Key: SPARK-20155
> URL: https://issues.apache.org/jira/browse/SPARK-20155
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.0.0
>Reporter: Rick Moritz
>
> According to :
> https://tools.ietf.org/html/rfc4180#section-2
> 7.  If double-quotes are used to enclose fields, then a double-quote
>appearing inside a field must be escaped by preceding it with
>another double quote.  For example:
>"aaa","b""bb","ccc"
> This currently works as is, but the following does not:
>  "aaa","b""b,b","ccc"
> while  "aaa","b\"b,b","ccc" does get parsed.
> I assume, this happens because quotes are currently being parsed in pairs, 
> and that somehow ends up unquoting delimiter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20336) spark.read.csv() with wholeFile=True option fails to read non ASCII unicode characters

2017-04-24 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15980962#comment-15980962
 ] 

Armin Braun commented on SPARK-20336:
-

I just tried this on latest Spark and Hadoop/Yarn `2.6.3` and it looks fine to 
me with your file:

{code}
$ bin/spark-shell --master yarn --deploy-mode client'

scala> spark.read.option("wholeFile", true).option("header", 
true).csv("file:///tmp/temp.csv").show()
++++
|col1|col2|col3|
++++
|   1|   a|text|
|   2|   b|テキスト|
|   3|   c| 텍스트|
|   4|   d|text|
|テキスト|null|null|
|텍스트"|null|null|
|   5|   e|last|
++++
{code}

> spark.read.csv() with wholeFile=True option fails to read non ASCII unicode 
> characters
> --
>
> Key: SPARK-20336
> URL: https://issues.apache.org/jira/browse/SPARK-20336
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: Spark 2.2.0 (master branch is downloaded from Github)
> PySpark
>Reporter: HanCheol Cho
>
> I used spark.read.csv() method with wholeFile=True option to load data that 
> has multi-line records.
> However, non-ASCII characters are not properly loaded.
> The following is a sample data for test:
> {code:none}
> col1,col2,col3
> 1,a,text
> 2,b,テキスト
> 3,c,텍스트
> 4,d,"text
> テキスト
> 텍스트"
> 5,e,last
> {code}
> When it is loaded without wholeFile=True option, non-ASCII characters are 
> shown correctly although multi-line records are parsed incorrectly as follows:
> {code:none}
> testdf_default = spark.read.csv("test.encoding.csv", header=True)
> testdf_default.show()
> ++++
> |col1|col2|col3|
> ++++
> |   1|   a|text|
> |   2|   b|テキスト|
> |   3|   c| 텍스트|
> |   4|   d|text|
> |テキスト|null|null|
> | 텍스트"|null|null|
> |   5|   e|last|
> ++++
> {code}
> When wholeFile=True option is used, non-ASCII characters are broken as 
> follows:
> {code:none}
> testdf_wholefile = spark.read.csv("test.encoding.csv", header=True, 
> wholeFile=True)
> testdf_wholefile.show()
> ++++
> |col1|col2|col3|
> ++++
> |   1|   a|text|
> |   2|   b||
> |   3|   c|   �|
> |   4|   d|text
> ...|
> |   5|   e|last|
> ++++
> {code}
> The result is same even if I use encoding="UTF-8" option with wholeFile=True.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17280) Flaky test: org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite and JavaDirectKafkaStreamSuite.testKafkaStream

2017-02-22 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun resolved SPARK-17280.
-
Resolution: Fixed

closing this, can't find any recent examples of this on Jenkins and haven't 
experienced this locally either as of late.
Also tried reproducing this running 1k+ loops of all the Kafka0.10_2.11 tests 
with 3 forks in parallel without issues.

> Flaky test: org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite and 
> JavaDirectKafkaStreamSuite.testKafkaStream
> 
>
> Key: SPARK-17280
> URL: https://issues.apache.org/jira/browse/SPARK-17280
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Tests
>Reporter: Yin Huai
>
> https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.2/1793
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1793/
> {code}
> org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream
> Error Message
> assertion failed: Partition [topic1, 0] metadata not propagated after timeout
> Stacktrace
> java.util.concurrent.TimeoutException: assertion failed: Partition [topic1, 
> 0] metadata not propagated after timeout
>   at 
> org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.createTopicAndSendData(JavaDirectKafkaStreamSuite.java:176)
>   at 
> org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream(JavaDirectKafkaStreamSuite.java:74)
> {code}
> {code}
> org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite.testKafkaRDD
> Error Message
> Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): 
> java.lang.AssertionError: assertion failed: Failed to get records for 
> spark-executor-java-test-consumer--363965267-1472280538438 topic2 0 0 after 
> polling for 512
>  at scala.Predef$.assert(Predef.scala:170)
>  at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>  at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
>  at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1684)
>  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1134)
>  at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1134)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1910)
>  at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1910)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>  at org.apache.spark.scheduler.Task.run(Task.scala:86)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> Stacktrace
> org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost): 
> java.lang.AssertionError: assertion failed: Failed to get records for 
> spark-executor-java-test-consumer--363965267-1472280538438 topic2 0 0 after 
> polling for 512
>   at scala.Predef$.assert(Predef.scala:170)
>   at 
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:227)
>   at 
> org.apache.spark.streaming.kafka010.KafkaRDD$KafkaRDDIterator.next(KafkaRDD.scala:193)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1684)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1134)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1134)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1910)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1910)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)

[jira] [Closed] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-18 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun closed SPARK-19592.
---
Resolution: Won't Fix

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-16 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869985#comment-15869985
 ] 

Armin Braun commented on SPARK-19592:
-

I see your point on these two:

{quote}
Isn't this going to mean changing every single test suite?
That is to say I could kind of imagine a broader cleanup and refactoring of 
test state. A big change just to remove a few lines of config doesn't seem 
worth it.
{quote}

Yea it will obviously require wider changes (but see below).  Looks to me like 
this would be a valid start for cleaning up tests state in general.

{quote}
 Ideally that's cleaned up all in one go or not.
{quote}

I mean you could go testsuite by testsuite and eventually drop the properties 
being injected by the build system. Doing this all in one go would admittedly 
be a big change.
Even an incremental approach (doing this step by step and having the test setup 
inside ScalaTest be redundant) would be worth it in my opinion though ... would 
already make the test env more readable (and less importantly but nice to have 
... runnable from the IDE) wouldn't it?

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-16 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869933#comment-15869933
 ] 

Armin Braun commented on SPARK-19592:
-

[~srowen] could I convince you or better to drop this one ? :)

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19275) Spark Streaming, Kafka receiver, "Failed to get records for ... after polling for 512"

2017-02-14 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun resolved SPARK-19275.
-
Resolution: Not A Problem

> Spark Streaming, Kafka receiver, "Failed to get records for ... after polling 
> for 512"
> --
>
> Key: SPARK-19275
> URL: https://issues.apache.org/jira/browse/SPARK-19275
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
> Environment: Apache Spark 2.0.0, Kafka 0.10 for Scala 2.11
>Reporter: Dmitry Ochnev
>
> We have a Spark Streaming application reading records from Kafka 0.10.
> Some tasks are failed because of the following error:
> "java.lang.AssertionError: assertion failed: Failed to get records for (...) 
> after polling for 512"
> The first attempt fails and the second attempt (retry) completes 
> successfully, - this is the pattern that we see for many tasks in our logs. 
> These fails and retries consume resources.
> A similar case with a stack trace are described here:
> https://www.mail-archive.com/user@spark.apache.org/msg56564.html
> https://gist.github.com/SrikanthTati/c2e95c4ac689cd49aab817e24ec42767
> Here is the line from the stack trace where the error is raised:
> org.apache.spark.streaming.kafka010.CachedKafkaConsumer.get(CachedKafkaConsumer.scala:74)
> We tried several values for "spark.streaming.kafka.consumer.poll.ms", - 2, 5, 
> 10, 30 and 60 seconds, but the error appeared in all the cases except the 
> last one. Moreover, increasing the threshold led to increasing total Spark 
> stage duration.
> In other words, increasing "spark.streaming.kafka.consumer.poll.ms" led to 
> fewer task failures but with cost of total stage duration. So, it is bad for 
> performance when processing data streams.
> We have a suspicion that there is a bug in CachedKafkaConsumer (and/or other 
> related classes) which inhibits the reading process.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866428#comment-15866428
 ] 

Armin Braun commented on SPARK-19592:
-

Imo this also relates to the ability to handle 
https://issues.apache.org/jira/browse/SPARK-8985 in a clean way btw.

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866367#comment-15866367
 ] 

Armin Braun commented on SPARK-19592:
-

[~srowen] 

{quote}
What about tests that make their own conf or need to?
{quote}

Those tests in particular made me interested in this for 
correctness/readability reasons. Maybe an example helps :)

in _org.apache.spark.streaming.InputStreamsSuite_ we have the situation that 
the conf is set up in the parent suit via just

{code}
  val conf = new SparkConf()
.setMaster(master)
.setAppName(framework)
{code}

now if you run that suit from the IDE one of the tests fails with an apparent 
error in the logic.

{code}
The code passed to eventually never returned normally. Attempted 664 times over 
10.01260721901 seconds. Last failure message: 10 did not equal 5.
{code}

You debug it and find out that it's because you get some _StreamingListener_ 
added to the context twice because the tests adds one manually that is already 
on the context.
Reason for that being that it's also added by the UI when you have 
_spark.ui.enabled_ set to default _true_.

So basically you now have a seemingly redundant line of code in a bunch of 
tests:

{code}
ssc.addStreamingListener(ssc.progressListener)
{code}

...  that appears wrong with the configuration (that you see if you just read 
the code) and requires you to also consider (maintain) what Maven or SBT is 
injecting in terms of environment.
---

So I think those tests that make their own config are the most troublesome 
since they have non-standard defaults injected. In my opinion it would be a lot 
easier to work with if the defaults would just be the standard production env. 
defaults when I create a new instance of the SparkConf and all deviation from 
that would be explicit in the code.

I agree it's not a big pain, still a quality issue worth fixing (imo). Reduces 
maintenance effort from drier test configs and makes tests easier to read like 
in the example above.

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19592:

Affects Version/s: 2.2.0

> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0, 2.2.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19592:

Description: 
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a 
{code}
var conf: SparkConf 
{code}
field in 
{code}
org.apache.spark.SparkFunSuite
{code}
 that has SparkConf set up with all the shared configuration that 
`systemProperties` currently provide. Obviously this cannot be done straight 
away given that
many subclasses of the parent suit do this, so I think it would be best to 
simply add a method to the parent that provides this configuration for now
and start refactoring away duplication in other suit setups from there step by 
step until the sys properties can be removed from the pom and sbt.build.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.
(also it would allow running more tests straight from the IDE which is always a 
nice thing imo)

  was:
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a 
{code}
var conf: SparkConf 
{code} 
field in 
{code}
org.apache.spark.SparkFunSuite
{code}
 that has SparkConf set up with all the shared configuration that 
`systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.
(also it would allow running more tests straight from the IDE which is always a 
nice thing imo)


> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code}
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide. Obviously this cannot be done straight 
> away given that
> many subclasses of the parent suit do this, so I think it would be best to 
> simply add a method to the parent that provides this configuration for now
> and start refactoring away duplication in other suit setups from there step 
> by step until the sys properties can be removed from the pom and sbt.build.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19592:

Description: 
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a 
{code}
var conf: SparkConf 
{code} 
field in 
{code}
org.apache.spark.SparkFunSuite
{code}
 that has SparkConf set up with all the shared configuration that 
`systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.
(also it would allow running more tests straight from the IDE which is always a 
nice thing imo)

  was:
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a 
{code}
var conf: SparkConf 
{code} 
field in 
{code}
org.apache.spark.SparkFunSuite
{code}
 that has SparkConf set up with all the shared configuration that 
`systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.


> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code} 
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.
> (also it would allow running more tests straight from the IDE which is always 
> a nice thing imo)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19592:

Description: 
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a 
{code}
var conf: SparkConf 
{code} 
field in 
{code}
org.apache.spark.SparkFunSuite
{code}
 that has SparkConf set up with all the shared configuration that 
`systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.

  was:
This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a `conf` field in 
`org.apache.spark.SparkFunSuite` that has SparkConf set up with all the shared 
configuration that `systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.


> Duplication in Test Configuration Relating to SparkConf Settings Should be 
> Removed
> --
>
> Key: SPARK-19592
> URL: https://issues.apache.org/jira/browse/SPARK-19592
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Minor
>
> This configuration for Surefire, Scalatest is duplicated in the parent POM as 
> well as the SBT build.
> While this duplication cannot be removed in general it can at least be 
> removed for all system properties that simply result in a SparkConf setting I 
> think.
> Instead of having lines like 
> {code}
> false
> {code}
> twice in the pom.xml
> and once in SBT as
> {code}
> javaOptions in Test += "-Dspark.ui.enabled=false",
> {code}
> it would be a lot cleaner to simply have a 
> {code}
> var conf: SparkConf 
> {code} 
> field in 
> {code}
> org.apache.spark.SparkFunSuite
> {code}
>  that has SparkConf set up with all the shared configuration that 
> `systemProperties` currently provide.
> This makes the build a lot easier to maintain and makes tests more readable 
> by making the environment setup more explicit in the code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19592) Duplication in Test Configuration Relating to SparkConf Settings Should be Removed

2017-02-14 Thread Armin Braun (JIRA)
Armin Braun created SPARK-19592:
---

 Summary: Duplication in Test Configuration Relating to SparkConf 
Settings Should be Removed
 Key: SPARK-19592
 URL: https://issues.apache.org/jira/browse/SPARK-19592
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.1.0
 Environment: Applies to all Environments
Reporter: Armin Braun
Priority: Minor


This configuration for Surefire, Scalatest is duplicated in the parent POM as 
well as the SBT build.
While this duplication cannot be removed in general it can at least be removed 
for all system properties that simply result in a SparkConf setting I think.

Instead of having lines like 
{code}
false
{code}
twice in the pom.xml
and once in SBT as
{code}
javaOptions in Test += "-Dspark.ui.enabled=false",
{code}
it would be a lot cleaner to simply have a `conf` field in 
`org.apache.spark.SparkFunSuite` that has SparkConf set up with all the shared 
configuration that `systemProperties` currently provide.

This makes the build a lot easier to maintain and makes tests more readable by 
making the environment setup more explicit in the code.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862954#comment-15862954
 ] 

Armin Braun commented on SPARK-19562:
-

PR added https://github.com/apache/spark/pull/16904 

> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated SPARK-19562:

Description: 
It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:none}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.

  was:
It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:bash}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.


> Gitignore Misses Folder dev/pr-deps
> ---
>
> Key: SPARK-19562
> URL: https://issues.apache.org/jira/browse/SPARK-19562
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
> Environment: Applies to all Environments
>Reporter: Armin Braun
>Priority: Trivial
>
> It's basically in the title.
> Running the build and tests as instructed by the Readme creates the folder 
> `dev/pr-deps` that is not covered by the gitignore leaving us with this:
> {code:none}
> ➜  spark git:(master) ✗ git status
>   
> On branch master
> Your branch is up-to-date with 'origin/master'.
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>   dev/pr-deps/
> {code}
> I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19562) Gitignore Misses Folder dev/pr-deps

2017-02-12 Thread Armin Braun (JIRA)
Armin Braun created SPARK-19562:
---

 Summary: Gitignore Misses Folder dev/pr-deps
 Key: SPARK-19562
 URL: https://issues.apache.org/jira/browse/SPARK-19562
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.1.0
 Environment: Applies to all Environments
Reporter: Armin Braun
Priority: Trivial


It's basically in the title.
Running the build and tests as instructed by the Readme creates the folder 
`dev/pr-deps` that is not covered by the gitignore leaving us with this:

{code:bash}
➜  spark git:(master) ✗ git status  

On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
  (use "git add ..." to include in what will be committed)

dev/pr-deps/
{code}

I think that folder should be added to the gitignore.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org