[jira] [Commented] (SPARK-23514) Replace spark.sparkContext.hadoopConfiguration by spark.sessionState.newHadoopConf()

2018-02-24 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375978#comment-16375978
 ] 

Xiao Li commented on SPARK-23514:
-

cc [~dongjoon] Do you want to make a try?

> Replace spark.sparkContext.hadoopConfiguration by 
> spark.sessionState.newHadoopConf()
> 
>
> Key: SPARK-23514
> URL: https://issues.apache.org/jira/browse/SPARK-23514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Priority: Major
>
> Check all the places where we directly use 
> {{spark.sparkContext.hadoopConfiguration}}. Instead, in some scenarios, it 
> makes more sense to call {{spark.sessionState.newHadoopConf()}} which blends 
> in settings from SQLConf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23405) The task will hang up when a small table left semi join a big table

2018-02-24 Thread KaiXinXIaoLei (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

KaiXinXIaoLei updated SPARK-23405:
--
Description: 
# I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales 
cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table 
,and the number is one. The `catalog_sales` table is a big table,  and the 
number is 10 billion. The task will be hang up:

!taskhang up.png!

 And the sql page is :

!SQL.png!

  was:
I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales cs 
on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table 
,and the number is one. The `catalog_sales` table is a big table,  and the 
number is 10 billion. The task will be hang up:

!taskhang up.png!

 And the sql page is :

!SQL.png!


> The task will hang up when a small table left semi join a big table
> ---
>
> Key: SPARK-23405
> URL: https://issues.apache.org/jira/browse/SPARK-23405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: SQL.png, taskhang up.png
>
>
> # I run a sql: `select ls.cs_order_number from ls left semi join 
> catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table 
> is a small table ,and the number is one. The `catalog_sales` table is a big 
> table,  and the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23514) Replace spark.sparkContext.hadoopConfiguration by spark.sessionState.newHadoopConf()

2018-02-24 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23514:
---

 Summary: Replace spark.sparkContext.hadoopConfiguration by 
spark.sessionState.newHadoopConf()
 Key: SPARK-23514
 URL: https://issues.apache.org/jira/browse/SPARK-23514
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li


Check all the places where we directly use 
{{spark.sparkContext.hadoopConfiguration}}. Instead, in some scenarios, it 
makes more sense to call {{spark.sessionState.newHadoopConf()}} which blends in 
settings from SQLConf.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23405) The task will hang up when a small table left semi join a big table

2018-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375973#comment-16375973
 ] 

Apache Spark commented on SPARK-23405:
--

User 'KaiXinXiaoLei' has created a pull request for this issue:
https://github.com/apache/spark/pull/20670

> The task will hang up when a small table left semi join a big table
> ---
>
> Key: SPARK-23405
> URL: https://issues.apache.org/jira/browse/SPARK-23405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: SQL.png, taskhang up.png
>
>
> I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales 
> cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small 
> table ,and the number is one. The `catalog_sales` table is a big table,  and 
> the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23405) The task will hang up when a small table left semi join a big table

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23405:


Assignee: Apache Spark

> The task will hang up when a small table left semi join a big table
> ---
>
> Key: SPARK-23405
> URL: https://issues.apache.org/jira/browse/SPARK-23405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Assignee: Apache Spark
>Priority: Major
> Attachments: SQL.png, taskhang up.png
>
>
> I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales 
> cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small 
> table ,and the number is one. The `catalog_sales` table is a big table,  and 
> the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23405) The task will hang up when a small table left semi join a big table

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23405:


Assignee: (was: Apache Spark)

> The task will hang up when a small table left semi join a big table
> ---
>
> Key: SPARK-23405
> URL: https://issues.apache.org/jira/browse/SPARK-23405
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: KaiXinXIaoLei
>Priority: Major
> Attachments: SQL.png, taskhang up.png
>
>
> I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales 
> cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small 
> table ,and the number is one. The `catalog_sales` table is a big table,  and 
> the number is 10 billion. The task will be hang up:
> !taskhang up.png!
>  And the sql page is :
> !SQL.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23513) java.io.IOException: Expected 12 fields, but got 5 for row :Spark submit error

2018-02-24 Thread Rawia (JIRA)
Rawia  created SPARK-23513:
--

 Summary: java.io.IOException: Expected 12 fields, but got 5 for 
row :Spark submit error 
 Key: SPARK-23513
 URL: https://issues.apache.org/jira/browse/SPARK-23513
 Project: Spark
  Issue Type: Bug
  Components: EC2, Examples, Input/Output, Java API
Affects Versions: 2.2.0, 1.4.0
Reporter: Rawia 


Hello

I'm trying to run a spark application (distributedWekaSpark) but  when I'm 
using the spark-submit command I get this error
{quote}{quote}ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) 
java.io.IOException: Expected 12 fields, but got 5 for row: 
outlook,temperature,humidity,windy,play
{quote}{quote}
I tried with other datasets but always the same error appeared, (always 12 
fields expected)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22324) Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17

2018-02-24 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-22324:

Summary: Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17   (was: 
Upgrade Arrow to version 0.8.0)

> Upgrade Arrow to version 0.8.0 and upgrade Netty to 4.1.17 
> ---
>
> Key: SPARK-22324
> URL: https://issues.apache.org/jira/browse/SPARK-22324
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 2.3.0
>
>
> Arrow version 0.8.0 is slated for release in early November, but I'd like to 
> start discussing to help get all the work that's being done synced up.
> Along with upgrading the Arrow Java artifacts, pyarrow on our Jenkins test 
> envs will need to be upgraded as well that will take a fair amount of work 
> and planning.
> One topic I'd like to discuss is if pyarrow should be an installation 
> requirement for pyspark, i.e. when a user pip installs pyspark, it will also 
> install pyarrow.  If not, then is there a minimum version that needs to be 
> supported?  We currently have 0.4.1 installed on Jenkins.
> There are a number of improvements and cleanups in the current code that can 
> happen depending on what we decide (I'll link them all here later, but off 
> the top of my head):
> * Decimal bug fix and improved support
> * Improved internal casting between pyarrow and pandas (can clean up some 
> workarounds), this will also verify data bounds if the user specifies a type 
> and data overflows.  see 
> https://github.com/apache/spark/pull/19459#discussion_r146421804
> * Better type checking when converting Spark types to Arrow
> * Timestamp conversion to microseconds (for Spark internal format)
> * Full support for using validity mask with 'object' types 
> https://github.com/apache/spark/pull/18664#discussion_r146567335
> * VectorSchemaRoot can call close more than once to simplify listener 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala#L90



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23207) Shuffle+Repartition on an DataFrame could lead to incorrect answers

2018-02-24 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-23207:

Fix Version/s: (was: 2.4.0)

> Shuffle+Repartition on an DataFrame could lead to incorrect answers
> ---
>
> Key: SPARK-23207
> URL: https://issues.apache.org/jira/browse/SPARK-23207
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jiang Xingbo
>Assignee: Jiang Xingbo
>Priority: Blocker
> Fix For: 2.3.0
>
>
> Currently shuffle repartition uses RoundRobinPartitioning, the generated 
> result is nondeterministic since the sequence of input rows are not 
> determined.
> The bug can be triggered when there is a repartition call following a shuffle 
> (which would lead to non-deterministic row ordering), as the pattern shows 
> below:
> upstream stage -> repartition stage -> result stage
> (-> indicate a shuffle)
> When one of the executors process goes down, some tasks on the repartition 
> stage will be retried and generate inconsistent ordering, and some tasks of 
> the result stage will be retried generating different data.
> The following code returns 931532, instead of 100:
> {code}
> import scala.sys.process._
> import org.apache.spark.TaskContext
> val res = spark.range(0, 1000 * 1000, 1).repartition(200).map { x =>
>   x
> }.repartition(200).map { x =>
>   if (TaskContext.get.attemptNumber == 0 && TaskContext.get.partitionId < 2) {
> throw new Exception("pkill -f java".!!)
>   }
>   x
> }
> res.distinct().count()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23512) Complex operations on Dataframe corrupts data

2018-02-24 Thread Nazarii Bardiuk (JIRA)
Nazarii Bardiuk created SPARK-23512:
---

 Summary: Complex operations on Dataframe corrupts data
 Key: SPARK-23512
 URL: https://issues.apache.org/jira/browse/SPARK-23512
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.2.1
Reporter: Nazarii Bardiuk


Next code demonstrates sequence of transformations for a DataFrame that 
corrupts data
{code}
from pyspark import SparkContext, SQLContext, Row
from pyspark.sql import Window
from pyspark.sql.functions import explode, lit, count, row_number, col, 
countDistinct

ss = SQLContext(SparkContext('local', 'pyspark'))
diffs = ss.createDataFrame([
Row(id="1", a=["1"], b=["2"], t="2"),
Row(id="2", a=["2"], b=["1"], t="1"),
Row(id="3", a=["1"], b=["4", "3"], t="3"),
Row(id="3", a=["1"], b=["4", "3"], t="4"),
Row(id="4", a=["1"], b=["4", "3"], t="3"),
Row(id="4", a=["1"], b=["4", "3"], t="4")
])

a = diffs.select("id", explode("a").alias("l"), "t").withColumn("problem", 
lit("a"))
b = diffs.select("id", explode("b").alias("l"), "t").withColumn("problem", 
lit("b")) \
.filter(col("t") != col("l"))

all = a.union(b)

grouped = all \
.groupBy("l", "t", "problem").agg(count("id").alias("count")) \
.withColumn("rn", row_number().over(Window.partitionBy("l", 
"problem").orderBy(col("count").desc( \
.withColumn("f", (col("rn") < 2) & (col("count") > 1)) \
.cache()  # the change that broke test

keep = grouped.filter("f").select("l", "t", "problem", "count")

agg = all.join(grouped.filter(~col("f")), ["l", "t", "problem"]) \
.withColumn("t", lit(None)) \
.groupBy("l", "t", "problem").agg(countDistinct("id").alias("count"))


keep.union(agg).show() # corrupts column "problem"
agg.union(keep).show() # as expected
{code}
 

Expected: data in "problem" column of both unions is the same
 Actual: "problem" column looses data
{code}
keep.union(agg).show() # corrupts column "problem"
+---++---+-+
|  l|   t|problem|count|
+---++---+-+
|  3|   4|  a|2|
|  4|   3|  a|2|
|  1|   4|  a|2|
|  1|null|  a|3|
|  2|null|  a|1|
+---++---+-+

agg.union(keep).show() # as expected
+---++---+-+
|  l|   t|problem|count|
+---++---+-+
|  1|null|  a|3|
|  2|null|  a|1|
|  3|   4|  b|2|
|  4|   3|  b|2|
|  1|   4|  a|2|
+---++---+-+

{code}
Note a cache() statement that was a tipping point that broke our code, without 
it works as expected



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22839:


Assignee: Apache Spark

> Refactor Kubernetes code for configuring driver/executor pods to use 
> consistent and cleaner abstraction
> ---
>
> Key: SPARK-22839
> URL: https://issues.apache.org/jira/browse/SPARK-22839
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Assignee: Apache Spark
>Priority: Major
>
> As discussed in https://github.com/apache/spark/pull/19954, the current code 
> for configuring the driver pod vs the code for configuring the executor pods 
> are not using the same abstraction. Besides that, the current code leaves a 
> lot to be desired in terms of the level and cleaness of abstraction. For 
> example, the current code is passing around many pieces of information around 
> different class hierarchies, which makes code review and maintenance 
> challenging. We need some thorough refactoring of the current code to achieve 
> better, cleaner, and consistent abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2018-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375859#comment-16375859
 ] 

Apache Spark commented on SPARK-22839:
--

User 'ifilonenko' has created a pull request for this issue:
https://github.com/apache/spark/pull/20669

> Refactor Kubernetes code for configuring driver/executor pods to use 
> consistent and cleaner abstraction
> ---
>
> Key: SPARK-22839
> URL: https://issues.apache.org/jira/browse/SPARK-22839
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Priority: Major
>
> As discussed in https://github.com/apache/spark/pull/19954, the current code 
> for configuring the driver pod vs the code for configuring the executor pods 
> are not using the same abstraction. Besides that, the current code leaves a 
> lot to be desired in terms of the level and cleaness of abstraction. For 
> example, the current code is passing around many pieces of information around 
> different class hierarchies, which makes code review and maintenance 
> challenging. We need some thorough refactoring of the current code to achieve 
> better, cleaner, and consistent abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22839) Refactor Kubernetes code for configuring driver/executor pods to use consistent and cleaner abstraction

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22839:


Assignee: (was: Apache Spark)

> Refactor Kubernetes code for configuring driver/executor pods to use 
> consistent and cleaner abstraction
> ---
>
> Key: SPARK-22839
> URL: https://issues.apache.org/jira/browse/SPARK-22839
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Yinan Li
>Priority: Major
>
> As discussed in https://github.com/apache/spark/pull/19954, the current code 
> for configuring the driver pod vs the code for configuring the executor pods 
> are not using the same abstraction. Besides that, the current code leaves a 
> lot to be desired in terms of the level and cleaness of abstraction. For 
> example, the current code is passing around many pieces of information around 
> different class hierarchies, which makes code review and maintenance 
> challenging. We need some thorough refactoring of the current code to achieve 
> better, cleaner, and consistent abstraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org




[jira] [Created] (SPARK-23511) Catalyst: Implement GetField

2018-02-24 Thread Nadav Samet (JIRA)
Nadav Samet created SPARK-23511:
---

 Summary: Catalyst: Implement GetField
 Key: SPARK-23511
 URL: https://issues.apache.org/jira/browse/SPARK-23511
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.2.1
Reporter: Nadav Samet


Similar to Invoke, InvokeStatic and NewInstance, it would be nice to have 
GetStaticField(expression, fieldName).

My use case is invoking a method on a companion object given the class of the 
companion object itself - it turns out they are not static. I'd like to be able 
to do something like this:

Invoke(GetStaticField(cls, "MODULE$", ...), "someMethod", ...)

My workaround is passing the companion to Invoke() directly to 
Literal.fromObject but I think having a general solution to call method on 
companion objects would be better.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2018-02-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375758#comment-16375758
 ] 

Frédéric ESCANDELL edited comment on SPARK-16996 at 2/24/18 8:16 PM:
-

On Hdp 2.6, i confirm that the steps described by [~maver1ck] work.

[~ste...@apache.org], why did hortonworks integrate Spark 2 with an older 
version of Hive 1.2 than the one distributed in HDP ?


was (Author: fescandell):
On Hdp 2.6, i confirm that the steps described by @Maciej Bryński work.

@Steve Loughran, why did hortonworks integrate Spark 2 with an older version of 
Hive 1.2 than the one distributed in HDP ?

> Hive ACID delta files not seen
> --
>
> Key: SPARK-16996
> URL: https://issues.apache.org/jira/browse/SPARK-16996
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.3, 2.1.2, 2.2.0
> Environment: Hive 1.2.1, Spark 1.5.2
>Reporter: Benjamin BONNET
>Priority: Critical
>
> spark-sql seems not to see data stored as delta files in an ACID Hive table.
> Actually I encountered the same problem as describe here : 
> http://stackoverflow.com/questions/35955666/spark-sql-is-not-returning-records-for-hive-transactional-tables-on-hdp
> For example, create an ACID table with HiveCLI and insert a row :
> {code}
> set hive.support.concurrency=true;
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.compactor.initiator.on=true;
> set hive.compactor.worker.threads=1;
>  CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 
> BUCKETS
> ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS 
>   INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO deltas VALUES("a","a");
> {code}
> Then make a query with spark-sql CLI :
> {code}
> SELECT * FROM deltas;
> {code}
> That query gets no result and there are no errors in logs.
> If you go to HDFS to inspect table files, you find only deltas
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxr-x---   - me hdfs  0 2016-08-10 14:03 
> /apps/hive/warehouse/deltas/delta_0020943_0020943
> {code}
> Then if you run compaction on that table (in HiveCLI) :
> {code}
> ALTER TABLE deltas COMPACT 'MAJOR';
> {code}
> As a result, the delta will be compute into a base file :
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxrwxrwx   - me hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> {code}
> Go back to spark-sql and the same query gets a result :
> {code}
> SELECT * FROM deltas;
> a   a
> Time taken: 0.477 seconds, Fetched 1 row(s)
> {code}
> But next time you make an insert into Hive table : 
> {code}
> INSERT INTO deltas VALUES("b","b");
> {code}
> spark-sql will immediately see changes : 
> {code}
> SELECT * FROM deltas;
> a   a
> b   b
> Time taken: 0.122 seconds, Fetched 2 row(s)
> {code}
> Yet there was no other compaction, but spark-sql "sees" the base AND the 
> delta file :
> {code}
> ~> hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 2 items
> drwxrwxrwx   - valdata hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> drwxr-x---   - valdata hdfs  0 2016-08-10 15:31 
> /apps/hive/warehouse/deltas/delta_0020956_0020956
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2018-02-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375758#comment-16375758
 ] 

Frédéric ESCANDELL edited comment on SPARK-16996 at 2/24/18 8:15 PM:
-

On Hdp 2.6, i confirm that the steps described by @Maciej Bryński work.

@Steve Loughran, why did hortonworks integrate Spark 2 with an older version of 
Hive 1.2 than the one distributed in HDP ?


was (Author: fescandell):
On Hdp 2.6, i confirm that the steps described by Maciej Bryński work.

Steve Loughran, why did hortonworks integrate Spark 2 with an older version of 
Hive 1.2 than the one distributed in HDP ?

> Hive ACID delta files not seen
> --
>
> Key: SPARK-16996
> URL: https://issues.apache.org/jira/browse/SPARK-16996
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.3, 2.1.2, 2.2.0
> Environment: Hive 1.2.1, Spark 1.5.2
>Reporter: Benjamin BONNET
>Priority: Critical
>
> spark-sql seems not to see data stored as delta files in an ACID Hive table.
> Actually I encountered the same problem as describe here : 
> http://stackoverflow.com/questions/35955666/spark-sql-is-not-returning-records-for-hive-transactional-tables-on-hdp
> For example, create an ACID table with HiveCLI and insert a row :
> {code}
> set hive.support.concurrency=true;
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.compactor.initiator.on=true;
> set hive.compactor.worker.threads=1;
>  CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 
> BUCKETS
> ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS 
>   INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO deltas VALUES("a","a");
> {code}
> Then make a query with spark-sql CLI :
> {code}
> SELECT * FROM deltas;
> {code}
> That query gets no result and there are no errors in logs.
> If you go to HDFS to inspect table files, you find only deltas
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxr-x---   - me hdfs  0 2016-08-10 14:03 
> /apps/hive/warehouse/deltas/delta_0020943_0020943
> {code}
> Then if you run compaction on that table (in HiveCLI) :
> {code}
> ALTER TABLE deltas COMPACT 'MAJOR';
> {code}
> As a result, the delta will be compute into a base file :
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxrwxrwx   - me hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> {code}
> Go back to spark-sql and the same query gets a result :
> {code}
> SELECT * FROM deltas;
> a   a
> Time taken: 0.477 seconds, Fetched 1 row(s)
> {code}
> But next time you make an insert into Hive table : 
> {code}
> INSERT INTO deltas VALUES("b","b");
> {code}
> spark-sql will immediately see changes : 
> {code}
> SELECT * FROM deltas;
> a   a
> b   b
> Time taken: 0.122 seconds, Fetched 2 row(s)
> {code}
> Yet there was no other compaction, but spark-sql "sees" the base AND the 
> delta file :
> {code}
> ~> hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 2 items
> drwxrwxrwx   - valdata hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> drwxr-x---   - valdata hdfs  0 2016-08-10 15:31 
> /apps/hive/warehouse/deltas/delta_0020956_0020956
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16996) Hive ACID delta files not seen

2018-02-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375758#comment-16375758
 ] 

Frédéric ESCANDELL commented on SPARK-16996:


On Hdp 2.6, i confirm that the steps described by Maciej Bryński work.

Steve Loughran, why did hortonworks integrate Spark 2 with an older version of 
Hive 1.2 than the one distributed in HDP ?

> Hive ACID delta files not seen
> --
>
> Key: SPARK-16996
> URL: https://issues.apache.org/jira/browse/SPARK-16996
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.3, 2.1.2, 2.2.0
> Environment: Hive 1.2.1, Spark 1.5.2
>Reporter: Benjamin BONNET
>Priority: Critical
>
> spark-sql seems not to see data stored as delta files in an ACID Hive table.
> Actually I encountered the same problem as describe here : 
> http://stackoverflow.com/questions/35955666/spark-sql-is-not-returning-records-for-hive-transactional-tables-on-hdp
> For example, create an ACID table with HiveCLI and insert a row :
> {code}
> set hive.support.concurrency=true;
> set hive.enforce.bucketing=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> set hive.compactor.initiator.on=true;
> set hive.compactor.worker.threads=1;
>  CREATE TABLE deltas(cle string,valeur string) CLUSTERED BY (cle) INTO 1 
> BUCKETS
> ROW FORMAT SERDE  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
> STORED AS 
>   INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
>   OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
> TBLPROPERTIES ('transactional'='true');
> INSERT INTO deltas VALUES("a","a");
> {code}
> Then make a query with spark-sql CLI :
> {code}
> SELECT * FROM deltas;
> {code}
> That query gets no result and there are no errors in logs.
> If you go to HDFS to inspect table files, you find only deltas
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxr-x---   - me hdfs  0 2016-08-10 14:03 
> /apps/hive/warehouse/deltas/delta_0020943_0020943
> {code}
> Then if you run compaction on that table (in HiveCLI) :
> {code}
> ALTER TABLE deltas COMPACT 'MAJOR';
> {code}
> As a result, the delta will be compute into a base file :
> {code}
> ~>hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 1 items
> drwxrwxrwx   - me hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> {code}
> Go back to spark-sql and the same query gets a result :
> {code}
> SELECT * FROM deltas;
> a   a
> Time taken: 0.477 seconds, Fetched 1 row(s)
> {code}
> But next time you make an insert into Hive table : 
> {code}
> INSERT INTO deltas VALUES("b","b");
> {code}
> spark-sql will immediately see changes : 
> {code}
> SELECT * FROM deltas;
> a   a
> b   b
> Time taken: 0.122 seconds, Fetched 2 row(s)
> {code}
> Yet there was no other compaction, but spark-sql "sees" the base AND the 
> delta file :
> {code}
> ~> hdfs dfs -ls /apps/hive/warehouse/deltas
> Found 2 items
> drwxrwxrwx   - valdata hdfs  0 2016-08-10 15:25 
> /apps/hive/warehouse/deltas/base_0020943
> drwxr-x---   - valdata hdfs  0 2016-08-10 15:31 
> /apps/hive/warehouse/deltas/delta_0020956_0020956
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23458) Flaky test: OrcQuerySuite

2018-02-24 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23458:
--
Issue Type: Bug  (was: Task)

>  Flaky test: OrcQuerySuite
> --
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: sbt.ForkMain$ForkError: java.lang.IllegalStateException: There are 
> 1 possibly leaked file streams.
>   at 
> org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$$anonfun$afterEach$1.apply$mcV$sp(SharedSparkSession.scala:115)
>   at 
> 

[jira] [Comment Edited] (SPARK-23458) Flaky test: OrcQuerySuite

2018-02-24 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375721#comment-16375721
 ] 

Dongjoon Hyun edited comment on SPARK-23458 at 2/24/18 6:37 PM:


I updated the title bacause the reported URL is OrcQuerySuite and added a link 
to `ParquetQuerySuite` because OrcQuerySuite `Enabling/disabling 
ignoreCorruptFiles` comes from `ParquetQuerySuite`. I'm looking at the 
following three together.
- ParquetyQuerySuite
- OrcQuerySuite
- FileBasedDataSourceSuite


was (Author: dongjoon):
I added a link to `ParquetQuerySuite` because OrcQuerySuite `Enabling/disabling 
ignoreCorruptFiles` comes from `ParquetQuerySuite`. I'm looking at the 
following three together.
- ParquetyQuerySuite
- OrcQuerySuite
- FileBasedDataSourceSuite

>  Flaky test: OrcQuerySuite
> --
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Task
>  Components: SQL, Tests
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 

[jira] [Updated] (SPARK-23458) Flaky test: OrcQuerySuite

2018-02-24 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23458:
--
Component/s: Tests

>  Flaky test: OrcQuerySuite
> --
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Task
>  Components: SQL, Tests
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: sbt.ForkMain$ForkError: java.lang.IllegalStateException: There are 
> 1 possibly leaked file streams.
>   at 
> org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$$anonfun$afterEach$1.apply$mcV$sp(SharedSparkSession.scala:115)
>   at 
> 

[jira] [Updated] (SPARK-23458) Flaky test: OrcQuerySuite

2018-02-24 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-23458:
--
Summary:  Flaky test: OrcQuerySuite  (was: OrcSuite flaky test)

>  Flaky test: OrcQuerySuite
> --
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: sbt.ForkMain$ForkError: java.lang.IllegalStateException: There are 
> 1 possibly leaked file streams.
>   at 
> org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$$anonfun$afterEach$1.apply$mcV$sp(SharedSparkSession.scala:115)
>   at 
> 

[jira] [Comment Edited] (SPARK-23458) OrcSuite flaky test

2018-02-24 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375721#comment-16375721
 ] 

Dongjoon Hyun edited comment on SPARK-23458 at 2/24/18 6:34 PM:


I added a link to `ParquetQuerySuite` because OrcQuerySuite `Enabling/disabling 
ignoreCorruptFiles` comes from `ParquetQuerySuite`. I'm looking at the 
following three together.
- ParquetyQuerySuite
- OrcQuerySuite
- FileBasedDataSourceSuite


was (Author: dongjoon):
I added a link to `ParquetQuerySuite` because OrcQuerySuite `Enabling/disabling 
ignoreCorruptFiles` comes from `ParquetQuerySuite`.

> OrcSuite flaky test
> ---
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at 

[jira] [Commented] (SPARK-23458) OrcSuite flaky test

2018-02-24 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375721#comment-16375721
 ] 

Dongjoon Hyun commented on SPARK-23458:
---

I added a link to `ParquetQuerySuite` because OrcQuerySuite `Enabling/disabling 
ignoreCorruptFiles` comes from `ParquetQuerySuite`.

> OrcSuite flaky test
> ---
>
> Key: SPARK-23458
> URL: https://issues.apache.org/jira/browse/SPARK-23458
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: AMPLab Jenkins
>Reporter: Marco Gaido
>Priority: Major
>
> Sometimes we have UT failures with the following stacktrace:
> {code:java}
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 15 times over 
> 10.01396221801 seconds. Last failure message: There are 1 possibly leaked 
> file streams..
>   at 
> org.scalatest.concurrent.Eventually$class.tryTryAgain$1(Eventually.scala:421)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:439)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.scalatest.concurrent.Eventually$class.eventually(Eventually.scala:308)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcTest.eventually(OrcTest.scala:45)
>   at 
> org.apache.spark.sql.test.SharedSparkSession$class.afterEach(SharedSparkSession.scala:114)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.afterEach(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.BeforeAndAfterEach$$anonfun$1.apply$mcV$sp(BeforeAndAfterEach.scala:234)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:379)
>   at 
> org.scalatest.Status$$anonfun$withAfterEffect$1.apply(Status.scala:375)
>   at org.scalatest.SucceededStatus$.whenCompleted(Status.scala:454)
>   at org.scalatest.Status$class.withAfterEffect(Status.scala:375)
>   at org.scalatest.SucceededStatus$.withAfterEffect(Status.scala:426)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:232)
>   at 
> org.apache.spark.sql.execution.datasources.orc.OrcQuerySuite.runTest(OrcQuerySuite.scala:583)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
>   at org.scalatest.Suite$class.run(Suite.scala:1147)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:52)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: sbt.ForkMain$ForkError: java.lang.IllegalStateException: There are 
> 1 possibly leaked file streams.
>   at 
> org.apache.spark.DebugFilesystem$.assertNoOpenStreams(DebugFilesystem.scala:54)
>   at 
> 

[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375660#comment-16375660
 ] 

Yuming Wang commented on SPARK-23510:
-

[~JPMoresmau] Can you try https://github.com/apache/spark/pull/20668?

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375636#comment-16375636
 ] 

Apache Spark commented on SPARK-23510:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/20668

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23510:


Assignee: Apache Spark

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23510:


Assignee: (was: Apache Spark)

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-02-24 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-23510:
---

 Summary: Support read data from Hive 2.2 and Hive 2.3 metastore
 Key: SPARK-23510
 URL: https://issues.apache.org/jira/browse/SPARK-23510
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20411) New features for expression.scalalang.typed

2018-02-24 Thread Diego Fanesi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375559#comment-16375559
 ] 

Diego Fanesi commented on SPARK-20411:
--

In SPARK-20890 new default aggregators for Long and Double are being added. I 
think this is still not enough. 

Spark should provide a trait that requires the sum and compare operators to be 
defined and the default aggregators sum() min() max() should work on every type 
that extends the trait. 

Maybe we could also make a different trait per operator so we don't force the 
developer to implement both operators when only one is needed.

This would enable any developer to use the default aggregators for any custom 
type without having to define a custom aggregator for every new case class 
used. 

 

> New features for expression.scalalang.typed
> ---
>
> Key: SPARK-20411
> URL: https://issues.apache.org/jira/browse/SPARK-20411
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.1.0
>Reporter: Loic Descotte
>Priority: Minor
>
> In Spark 2 it is possible to use typed expressions for aggregation methods: 
> {code}
> import org.apache.spark.sql.expressions.scalalang._ 
> dataset.groupByKey(_.productId).agg(typed.sum[Token](_.score)).toDF("productId",
>  "sum").orderBy('productId).show
> {code}
> It seems that only avg, count and sum are defined : 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/expressions/scalalang/typed.html
> It is very nice to be able to use a typesafe DSL, but it would be good to 
> have more possibilities, like min and max functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23509) Upgrade commons-net from 2.2 to 3.1

2018-02-24 Thread PandaMonkey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PandaMonkey updated SPARK-23509:

Description: 
Hi, after analyzing spark-master\core\pom.xml, we found that Spark-core depends 
on org.apache.hadoop:hadoop-client:2.6.5, which transitivity introduced 
commons-net:3.1. At the same time, Spark-core directly depends on a older 
version of commons-net:2.2. By further look into the source code, these two 
versions of commons-net have many different features. The dependency conflict 
problem brings high risks of "NotClassDefFoundError:" or "NoSuchMethodError" 
issues at runtime. Please notice this problem. Maybe upgrading commons-net from 
2.2 to 3.1 is a good choice. Hope this report can help you. Thanks!

 

Regards,

Panda

  was:
Hi, after analyzing spark-master\core\pom.xml, we found that Spark-core depends 
on org.apache.hadoop:hadoop-client:2.6.5, which transitivity introduced 
commons-net:3.1. At the same time, Spark-core directly depends on a older 
version of commons-net:2.2. By further look into the source code, these two 
versions of commons-net have many different features. The dependency conflict 
problem brings high risks of "NotClassDefFoundError:" or "NoSuchMethodError" 
issues at runtime. Please notice this problem. Maybe upgrading commons-net from 
2.2 to 3.1 is a good choice. Please notice this problem. Hope this report can 
help you. Thanks!

 

Regards,

Panda


> Upgrade commons-net from 2.2 to 3.1
> ---
>
> Key: SPARK-23509
> URL: https://issues.apache.org/jira/browse/SPARK-23509
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: PandaMonkey
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: spark.txt
>
>
> Hi, after analyzing spark-master\core\pom.xml, we found that Spark-core 
> depends on org.apache.hadoop:hadoop-client:2.6.5, which transitivity 
> introduced commons-net:3.1. At the same time, Spark-core directly depends on 
> a older version of commons-net:2.2. By further look into the source code, 
> these two versions of commons-net have many different features. The 
> dependency conflict problem brings high risks of "NotClassDefFoundError:" or 
> "NoSuchMethodError" issues at runtime. Please notice this problem. Maybe 
> upgrading commons-net from 2.2 to 3.1 is a good choice. Hope this report can 
> help you. Thanks!
>  
> Regards,
> Panda



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23509) Upgrade commons-net from 2.2 to 3.1

2018-02-24 Thread PandaMonkey (JIRA)
PandaMonkey created SPARK-23509:
---

 Summary: Upgrade commons-net from 2.2 to 3.1
 Key: SPARK-23509
 URL: https://issues.apache.org/jira/browse/SPARK-23509
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: PandaMonkey
 Fix For: 2.4.0
 Attachments: spark.txt

Hi, after analyzing spark-master\core\pom.xml, we found that Spark-core depends 
on org.apache.hadoop:hadoop-client:2.6.5, which transitivity introduced 
commons-net:3.1. At the same time, Spark-core directly depends on a older 
version of commons-net:2.2. By further look into the source code, these two 
versions of commons-net have many different features. The dependency conflict 
problem brings high risks of "NotClassDefFoundError:" or "NoSuchMethodError" 
issues at runtime. Please notice this problem. Maybe upgrading commons-net from 
2.2 to 3.1 is a good choice. Please notice this problem. Hope this report can 
help you. Thanks!

 

Regards,

Panda



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23509) Upgrade commons-net from 2.2 to 3.1

2018-02-24 Thread PandaMonkey (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PandaMonkey updated SPARK-23509:

Attachment: spark.txt

> Upgrade commons-net from 2.2 to 3.1
> ---
>
> Key: SPARK-23509
> URL: https://issues.apache.org/jira/browse/SPARK-23509
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: PandaMonkey
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: spark.txt
>
>
> Hi, after analyzing spark-master\core\pom.xml, we found that Spark-core 
> depends on org.apache.hadoop:hadoop-client:2.6.5, which transitivity 
> introduced commons-net:3.1. At the same time, Spark-core directly depends on 
> a older version of commons-net:2.2. By further look into the source code, 
> these two versions of commons-net have many different features. The 
> dependency conflict problem brings high risks of "NotClassDefFoundError:" or 
> "NoSuchMethodError" issues at runtime. Please notice this problem. Maybe 
> upgrading commons-net from 2.2 to 3.1 is a good choice. Please notice this 
> problem. Hope this report can help you. Thanks!
>  
> Regards,
> Panda



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375492#comment-16375492
 ] 

Apache Spark commented on SPARK-23508:
--

User 'caneGuy' has created a pull request for this issue:
https://github.com/apache/spark/pull/20667

> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.
> below is an jmap:
> !elepahnt-oom1.png!
> !elephant-oom.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23508:


Assignee: (was: Apache Spark)

> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.
> below is an jmap:
> !elepahnt-oom1.png!
> !elephant-oom.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23508:


Assignee: Apache Spark

> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Assignee: Apache Spark
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.
> below is an jmap:
> !elepahnt-oom1.png!
> !elephant-oom.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread zhoukang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated SPARK-23508:
-
Description: 
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom
{code:java}
val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
{code}
Since whenever we apply a new BlockManagerId, it will put into this map.

below is an jmap:

!elepahnt-oom1.png!

!elephant-oom.png!

  was:
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom
{code:java}
val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
{code}
Since whenever we apply a new BlockManagerId, it will put into this map.

!elephant-oom.png!


> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.
> below is an jmap:
> !elepahnt-oom1.png!
> !elephant-oom.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread zhoukang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated SPARK-23508:
-
Description: 
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom
{code:java}
val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
{code}
Since whenever we apply a new BlockManagerId, it will put into this map.

!elephant-oom.png!

  was:
blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom
{code:java}
val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
{code}
Since whenever we apply a new BlockManagerId, it will put into this map.


> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.
> !elephant-oom.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread zhoukang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhoukang updated SPARK-23508:
-
Attachment: elephant-oom.png
elepahnt-oom1.png

> blockManagerIdCache in BlockManagerId may cause oom
> ---
>
> Key: SPARK-23508
> URL: https://issues.apache.org/jira/browse/SPARK-23508
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.1.1, 2.2.1
>Reporter: zhoukang
>Priority: Major
> Attachments: elepahnt-oom1.png, elephant-oom.png
>
>
> blockManagerIdCache in BlockManagerId will not remove old values which may 
> cause oom
> {code:java}
> val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
> BlockManagerId]()
> {code}
> Since whenever we apply a new BlockManagerId, it will put into this map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23508) blockManagerIdCache in BlockManagerId may cause oom

2018-02-24 Thread zhoukang (JIRA)
zhoukang created SPARK-23508:


 Summary: blockManagerIdCache in BlockManagerId may cause oom
 Key: SPARK-23508
 URL: https://issues.apache.org/jira/browse/SPARK-23508
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Affects Versions: 2.2.1, 2.1.1
Reporter: zhoukang


blockManagerIdCache in BlockManagerId will not remove old values which may 
cause oom
{code:java}
val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
{code}
Since whenever we apply a new BlockManagerId, it will put into this map.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23448) Dataframe returns wrong result when column don't respect datatype

2018-02-24 Thread Liang-Chi Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16375439#comment-16375439
 ] 

Liang-Chi Hsieh commented on SPARK-23448:
-

In fact this is exactly the JSON parser's behavior, not a bug. We don't allow 
partial result for corrupted records. Except for the field configured by 
{{columnNameOfCorruptRecord}}, all fields will be set to {{null}}.

> Dataframe returns wrong result when column don't respect datatype
> -
>
> Key: SPARK-23448
> URL: https://issues.apache.org/jira/browse/SPARK-23448
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
> Environment: Local
>Reporter: Ahmed ZAROUI
>Priority: Major
>
> I have the following json file that contains some noisy data(String instead 
> of Array):
>  
> {code:java}
> {"attr1":"val1","attr2":"[\"val2\"]"}
> {"attr1":"val1","attr2":["val2"]}
> {code}
> And i need to specify schema programatically like this:
>  
> {code:java}
> implicit val spark = SparkSession
>   .builder()
>   .master("local[*]")
>   .config("spark.ui.enabled", false)
>   .config("spark.sql.caseSensitive", "True")
>   .getOrCreate()
> import spark.implicits._
> val schema = StructType(
>   Seq(StructField("attr1", StringType, true),
>   StructField("attr2", ArrayType(StringType, true), true)))
> spark.read.schema(schema).json(input).collect().foreach(println)
> {code}
> The result given by this code is:
> {code:java}
> [null,null]
> [val1,WrappedArray(val2)]
> {code}
> Instead of putting null in corrupted column, all columns of the first message 
> are null
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23507) Migrate file-based data sources to data source v2

2018-02-24 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-23507:
--

 Summary: Migrate file-based data sources to data source v2
 Key: SPARK-23507
 URL: https://issues.apache.org/jira/browse/SPARK-23507
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.1
Reporter: Gengliang Wang


Migrate file-based data sources to data source v2, including:
 # Parquet
 # ORC
 # Json
 # CSV
 # JDBC
 # Text



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org