[GitHub] spark issue #20823: [SPARK-23674] Add Spark ML Listener for Tracking ML Pipe...

2018-10-16 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/20823
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22550: [SPARK-25501] Kafka delegation token support

2018-10-01 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/22550
  
close this one since other PR is working on this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22550: [SPARK-25501] Kafka delegation token support

2018-10-01 Thread merlintang
Github user merlintang closed the pull request at:

https://github.com/apache/spark/pull/22550


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.

2018-10-01 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/22598
  
@gaborgsomogyi  thanks for your PR, I am going through the details and test 
on my local machine. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22550: [SPARK-25501] Kafka delegation token support

2018-09-25 Thread merlintang
GitHub user merlintang opened a pull request:

https://github.com/apache/spark/pull/22550

[SPARK-25501] Kafka delegation token support

## What changes were proposed in this pull request?

Kafaka is going to support delegation token, Spark need to read the 
delegation token like Hive, HDFS and HBase server. 

## How was this patch tested?

manually check

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merlintang/spark kafka-dtoken

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22550.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22550


commit c59ea5eaffc9889074226cf96a0e704672cdb290
Author: Mingjie Tang 
Date:   2018-09-25T21:20:30Z

[RMP-11860][SPARK-25501] Kafka Delegation Token Support for Spark

commit 7202ff968fa9a330e112a4958e38fd7f36e53341
Author: Mingjie Tang 
Date:   2018-09-26T00:31:35Z

update with kafka config




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...

2018-06-26 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/21455
  
@gabor. These fields are important for us the understand the spark kafka
streaming data like the topic name. we can use these information to track
the system status.

On Tue, Jun 26, 2018 at 4:52 AM Gabor Somogyi 
wrote:

> Why is it required at all? Making things visible without proper reason is
> not a good idea.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/21455#issuecomment-400279326>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-RAJjzhNWzKkXIaFGViMpWFEB0hEks5uAiBlgaJpZM4USHvN>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20823: [SPARK-23674] Add Spark ML Listener for Tracking ML Pipe...

2018-06-11 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/20823
  
@holdenk can you look at this PR? thanks in advance. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...

2018-06-11 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/21455
  
@jerryshao  Actually, we can not use reflection to get this field 
information. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-07 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21504#discussion_r193911087
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
 ---
@@ -55,6 +56,11 @@ class StreamingQueryManager private[sql] (sparkSession: 
SparkSession) extends Lo
   @GuardedBy("awaitTerminationLock")
   private var lastTerminatedQuery: StreamingQuery = null
 
+  sparkSession.sparkContext.conf.get(STREAMING_QUERY_LISTENERS).foreach { 
classNames =>
+Utils.loadExtensions(classOf[StreamingQueryListener], classNames,
+  sparkSession.sparkContext.conf).foreach(addListener)
+  }
+
--- End diff --

two comments here: 
1. we need to log the registration here 
2. we need to use try catch for this, it is possible that register fail. 
this would break the job. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20823: [SPARK-23674] Add Spark ML Listener for Tracking ML Pipe...

2018-06-07 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/20823
  
@jmwdpk can you update this pr, since there is conflict. I have update this 
pr. https://github.com/merlintang/spark/commits/SPARK-23674


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21455: [SPARK-24093][DStream][Minor]Make some fields of KafkaSt...

2018-05-29 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/21455
  
@jerryshao  can you review this minor update ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21455: [SPARK-24093][DStream][Minor]Make some fields of ...

2018-05-29 Thread merlintang
GitHub user merlintang opened a pull request:

https://github.com/apache/spark/pull/21455

[SPARK-24093][DStream][Minor]Make some fields of KafkaStreamWriter/In…

…ternalRowMicroBatchWriter visible to outside of the classes

## What changes were proposed in this pull request?

This PR is created to make relevant fields of KafkaStreamWriter and 
InternalRowMicroBatchWriter visible to outside of the classes.

## How was this patch tested?
manual tests

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merlintang/spark “Spark-24093”

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21455.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21455


commit 6233528063996dabe780d5b04f874f22846e40d4
Author: Mingjie Tang 
Date:   2018-05-29T19:49:17Z

[SPARK-24093][DStream][Minor]Make some fields of 
KafkaStreamWriter/InternalRowMicroBatchWriter visible to outside of the classes




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-17 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  can you backport this to branch 2.2 as well. 

thanks 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-10 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  and @steveloughran  thanks for your comments and review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-09 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@steveloughran can you review the added system test cases? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2018-01-02 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
My local test is ok. I  would set up a system test and update this soon.
sorry about this delay.

On Tue, Jan 2, 2018 at 3:42 PM, Marcelo Vanzin 
wrote:

> Any updates?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/19885#issuecomment-354905082>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-RQZs7FMQzEUSsq4qiej6xlpU2g8ks5tGr72gaJpZM4Q1hI9>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-14 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
I am so sorry for the late of testing function, I would update it soon.

On Thu, Dec 14, 2017 at 12:55 PM, UCB AMPLab 
wrote:

> Can one of the admins verify this patch?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/19885#issuecomment-351832769>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-dnjtJZHPD-2OulAGPdSSASXOKCJks5tAYsugaJpZM4Q1hI9>
> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
I have added this test case for the URI comparing based on Steve's 
comments. I have tested this in my local vm, it pass the test. 

meanwhile, for the hdfs://namenode1/path1 hdfs://namenode1:8020/path2  , 
the default port number of hdfs can be got. thus, they also matched. 

below is the test case:

test("compare URI for filesystem") {

//case 1
var srcUri = new URI("file:///file1")
var dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 2
srcUri = new URI("file:///c:file1")
dstUri = new URI("file://c:file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 3
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 4
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user/")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 5
srcUri = new URI("hdfs:/path1")
dstUri = new URI("hdfs:/path2")
assert(Client.compareUri(srcUri, dstUri) == true)

//case 6
srcUri = new URI("file:///file1")
dstUri = new URI("file://host/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 7
srcUri = new URI("file://host/file1")
dstUri = new URI("file:///file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 8
srcUri = new URI("file://host/file1")
dstUri = new URI("file://host2/file2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 9
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket2@user/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 10
srcUri = new URI("wasb://bucket1@user")
dstUri = new URI("wasb://bucket1@user2/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 11
srcUri = new URI("s3a://user@pass:bucket1/")
dstUri = new URI("s3a://user2@pass2:bucket1/")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 12
srcUri = new URI("hdfs://namenode1/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)

//case 13
srcUri = new URI("hdfs://namenode1:8020/path1")
dstUri = new URI("hdfs://namenode1:8080/path2")
assert(Client.compareUri(srcUri, dstUri) == false)
  }




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao  yes, hdfs://us...@nn1.com:8020 and hdfs://us...@nn1.com:8020 
would consider as two filesystem, since the authority information should be 
taken into consideration. that is why need to add the authority to check two 
FS. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19885: [SPARK-22587] Spark job fails if fs.defaultFS and...

2017-12-04 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19885#discussion_r154827513
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -1428,6 +1428,12 @@ private object Client extends Logging {
   return false
 }
 
+val srcAuthority = srcUri.getAuthority()
+val detAuthority = dstUri.getAuthority()
+if (srcAuthority != detAuthority || (srcAuthority != null && 
!srcAuthority.equalsIgnoreCase(detAuthority))) {
--- End diff --

thanks all, I would update the PR soon. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19885: [SPARK-22587] Spark job fails if fs.defaultFS and applic...

2017-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/19885
  
@jerryshao can you review this patch? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19885: [SPARK-22587] Spark job fails if fs.defaultFS and...

2017-12-04 Thread merlintang
GitHub user merlintang opened a pull request:

https://github.com/apache/spark/pull/19885

[SPARK-22587] Spark job fails if fs.defaultFS and application jar are d…

…ifferent url

## What changes were proposed in this pull request?

Two filesystems comparing does not consider the authority of URI. 
Therefore, we have to add the authority to compare two filesystem, and  two 
filesystem with different authority can not be the same FS. 

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merlintang/spark EAR-7377

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19885.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19885


commit 3675f0a41fc0715d3d7122bbff3ab6d8fbe057c9
Author: Mingjie Tang 
Date:   2017-12-04T23:31:31Z

SPARK-22587 Spark job fails if fs.defaultFS and application jar are 
different url




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16165: [SPARK-8617] [WEBUI] HistoryServer: Include in-progress ...

2017-04-05 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16165
  
@markhamstra  Thanks all. 

btw: what if there are many redundant inprogress files in the disk and 
impact the system performance? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16165: [SPARK-8617] [WEBUI] HistoryServer: Include in-progress ...

2017-04-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16165
  
@vanzin sorry, I mean the 2.1.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16165: [SPARK-8617] [WEBUI] HistoryServer: Include in-progress ...

2017-04-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16165
  
should we backport this into 2.1? @vanzin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-28 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/17092
  
@Yunni I test this patch locally, it can work, but I have one idea to 
improve it. We can discuss it in other ticket. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17092: [SPARK-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-28 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/17092
  
@Yunni ok, let us discuss the further optimization step in other ticket. 
the current patch is LGTM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16965
  
@Yunni  thanks, where I mention the L is the number of hash tables. 

By this way, the memory usage would be O(L*N). the approximate NN searching 
cost in one partition is O(L*N'). Where N is the number of input dataset, and 
N' is the number of data points in one partition. right? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16965
  
@Yunni Ok, if we want to move this quicker, we can keep the current AND-OR 
implementation.

(2)(3) you mention that you explode the inner table (dataset). Does it mean 
for each tuple of inner table (says t_i) and multiple hash functions (say h_0, 
h_1, ... h_l) . you create multiple rows like (h_0, t_i), (h_1, t_i), ... (h_l, 
t_i). am i correct?   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-24 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16965
  
@Yunni Yes, we can use the AND-OR  to increase the possibility by having 
more the numHashTables and numHashFunctions. For the further user extension, if 
users have a hash function with lower possibility, the OR-AND could be used.  

(1) I do not need to change Array[Vector], numHashTables, numHashFunctions, 
we need to change the function to compute the hashDistance (i.e.,hashDistance), 
as well as the sameBucket function in the approxNearestNeighbors.

(3) for the simijoin, I have one question here, if you do a join based on 
the hashed value of input tuples, the joined key would be array(vector). Am i 
right?  if it is, does this meet OR-amplification? please clarify me if I am 
wrong. 

(4) for the index part, I think it would be work. it is pretty similar as 
the routing table idea for the graphx.  thus, I think we can create a other 
data frame with the same partitioner of the input data frame, then, the newly 
created data frame would contain the index for the input tables without 
disturbing the data frame. 

5) the other major concern would be memory overhead, Can we reduce the 
memory usage for the output hash value i.e., array(vector)? Because the users 
said that the current way spent extensive of memory. therefore, one way we can 
do using the bit to respected the hashed value for the min-hash, the other way 
would use the sparse vector.  what do you think ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-23 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16965
  
@Yunni  I agree with you that the current NN search and Join are using the 
AND-OR. We can discuss how to use the OR-AND for that two searching as well.  

For the OR-AND option, it is used when the effective threshold is low. 
please refer to the table in the page 31 and 33. 
http://web.stanford.edu/class/cs345a/slides/05-LSH.pdf

You can notice, when the p is lower, the OR-AND can amplify the hash family 
possibility from 0.0985 to 0.5440.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16965: [Spark-18450][ML] Scala API Change for LSH AND-amplifica...

2017-02-22 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16965
  
It seems this patch provide the AND-OR amplification. Can we provide the 
option for users to choose the OR-AND amplification as well? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-06 Thread merlintang
Github user merlintang closed the pull request at:

https://github.com/apache/spark/pull/15819


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Many thanks, Xiao. I learnt lots. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-05 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94906952
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  import sqlContext.implicits._
--- End diff --

thanks, xiao, I have reverted that and test locally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-05 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94727237
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
--- End diff --

thanks, it is updated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-05 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94727256
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  import sqlContext.implicits._
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-05 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94727246
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +63,63 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  @transient var createdTempDir: Option[Path] = None
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
+ executionId
--- End diff --

Done! thanks xiao. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-03 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@gatorsmile can you retest the patch, then we can merge. Sorry to ping you 
multiple times since several users are asking this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94361979
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

 Sorry Xiao, since one of my best friend is Tao. :). Sorry. It is updated.  
Thanks again. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94359244
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

thanks Tao, I have created a dataframe, then create registerTempTable as 
following.
 
 val df = sqlContext.createDataFrame((1 to 2).map(i => (i, 
"a"))).toDF("key", "value")
 df.select("value").repartition(1).registerTempTable("tbl")

it can work, but it looks like fuzzy. what do you think? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94351849
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +63,63 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  @transient var createdTempDir: Option[Path] = None
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
+return executionId
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-02 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94351862
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
--- End diff --

does the temporary view supported in the 1.6.x?  I just used the 
hivecontext to create the view, but it does not work. because this is small 
test case, the created table here would be ok. please advise. thanks so much, 
Tao.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-01 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@gatorsmile I have backport the test case in #16339 with small 
modification. because the "INSERT OVERWRITE TABLE tab SELECT '$i'" will bring 
the issue from hive side e.g., 
https://issues.apache.org/jira/browse/HIVE-12200. Thus, I just create a temp 
table and insert data from the select temp table. please double check and 
verify. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-29 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
yes, let me backport the test cases for checking the staging file.

On Thu, Dec 29, 2016 at 10:11 PM, Xiao Li  wrote:

> Is that possible to backport the test cases in #16399
> <https://github.com/apache/spark/pull/16399>?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-269736325>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-dUsq_LmnsbbZ4qWULVBU8rAzvwCks5rNKCLgaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-29 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Thanks, Wenchen, I have backport the code of #16339 to here, I have tested
it locally. Can you review and verify?

On Sun, Dec 25, 2016 at 11:04 PM, Wenchen Fan 
wrote:

> #16399 <https://github.com/apache/spark/pull/16399> has been merged, feel
> free if you wanna backport to 1.6
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-269173723>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-efAatioCgVN7gTwHAmvYFjll5ksks5rL2bsgaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-20 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@gatorsmile Great! thanks so much, because I was pinged multiple times for 
this bug. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-20 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@cloud-fan @gatorsmile  I have backport the code from #16134, can you 
verify and backport this to spark 1.6.x?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-19 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@gatorsmile one more customer is running into this issue in the spark 
1.6.x. I backport the code #16134 to here and test it manually. Please verify. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16134: [SPARK-18703] [SQL] Drop Staging Directories and Data Fi...

2016-12-15 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16134
  
This patch is related to the path #15819 for spark 1.6.  In the #15819, I
can add the code from this patch(#16134) now, then we can fix the staging
files issues in the spark 1.6.x.

On Thu, Dec 15, 2016 at 12:54 PM, Reynold Xin 
wrote:

> sounds good to backport into 2.x branches. We can also backport into 1.6
> if it is easy.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/16134#issuecomment-267441114>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-dvWSszpAXGIg06108mEbuFIjlltks5rIakAgaJpZM4LDg41>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16134: [SPARK-18703] [SQL] Drop Staging Directories and Data Fi...

2016-12-15 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/16134
  
+1 backport to spark 1.6.x

On Thu, Dec 15, 2016 at 8:14 AM, Xiao Li  wrote:

> The staging directory and files will not be removed when users hitting
> abnormal termination of JVM. In addition, if the JVM does not stop, these
> temporary files could still consume a lot of spaces. Thus, I think we need
> to backport it. However, I am not sure whether we should backport it to 
all
> the previous versions (2.1, 2.0 and 1.6)
>
> @rxin <https://github.com/rxin> Could you please make a decision?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/16134#issuecomment-267368652>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-SgI6KMktXc0vMSQyycvTPIA4Ewqks5rIWd0gaJpZM4LDg41>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-14 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@cloud-fan @gatorsmile this patch is related to #16134, It seems #16134 
would be merged soon. Meanwhile,  should we backport #16104 into 1.6.x? please 
advise. or else, I just backport #16134 and #12770 to the spark 1.6.x?   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16134: [SPARK-18703] [SQL] Drop Staging Directories and ...

2016-12-13 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16134#discussion_r92244682
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -328,6 +332,15 @@ case class InsertIntoHiveTable(
 holdDDLTime)
 }
 
+// Attempt to delete the staging directory and the inclusive files. If 
failed, the files are
+// expected to be dropped at the normal termination of VM since 
deleteOnExit is used.
+try {
+  createdTempDir.foreach { path => 
path.getFileSystem(hadoopConf).delete(path, true) }
+} catch {
+  case NonFatal(e) =>
+logWarning(s"Unable to delete staging directory: $stagingDir.\n" + 
e)
+}
+
 // Invalidate the cache.
 sqlContext.sharedState.cacheManager.invalidateCache(table)
--- End diff --

should we delete the staging files before or after the invalidateCache? 
does it matter? logically, we should invalid cache first, then remove the 
intermediate dataset s.t the cache can be recovered from the file from disks. 
am i right? please clarify? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-13 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Great, once the  #16134 <https://github.com/apache/spark/pull/16134> is
done, we can backport them together.

On Tue, Dec 13, 2016 at 12:18 AM, Wenchen Fan 
wrote:

> yea, I think we should backport a complete staging dir cleanup
> functionality to 1.6, let's wait for #16134
> <https://github.com/apache/spark/pull/16134>
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-266674495>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-VSxRlOyt2H4ySKmNJm4j4q5facoks5rHlS8gaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-12 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@gatorsmile what is going on this patch? this is a backport code, thus, can 
you merge this patch into 1.6.x ?  more than one users are running into this 
issue in the spark 1.6.x. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
do you exit the spark shell ?  I have tested on this, and this staging file
would be removed after we exit the spark shell under spark 2.0.x.

meanwhile, the staging file are used for hive to write data, and if one
hive insert data fail in the middle, the staging file could be used.

On Tue, Dec 6, 2016 at 5:09 PM, lichenglin  wrote:

> here is some result for du -h --max-depth=1 .
> 3.3G ./.hive-staging_hive_2016-12-06_18-17-48_899_1400956608265117052-5
> 13G ./.hive-staging_hive_2016-12-06_15-43-35_928_6647980494630196053-5
> 8.6G ./.hive-staging_hive_2016-12-06_17-05-51_951_8422682528744006964-5
> 9.7G ./.hive-staging_hive_2016-12-06_17-14-44_748_6947381677226271245-5
> 9.2G ./day=2016-12-01
> 8.5G ./day=2016-11-19
>
> I run a sql like insert overwrite db.table partition(day='2016-12-06')
> select * from tmpview everyday
> each sql create a "hive-staging folder".
>
> Can I delete the folders manually??
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-265324884>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-cCebx3piETzStocxtvovCRPX6Ukks5rFgdYgaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13670: [SPARK-15951] Change Executors Page to use datatables to...

2016-12-06 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/13670
  
@kishorvpatil 
you provided the function allexecutors, which is used to return the dead 
and active executor information. 

For the document
http://spark.apache.org/docs/latest/monitoring.html
for 
/applications/[app-id]/executorsA list of all executors for the given 
application.

We had better document more clearly what is meaning of functions for the 
2.1 version. 
/applications/[app-id]/executors xxx
/applications/[app-id]/allexecutors  

This confused people, because our test already run into this issue. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@cloud-fan  this is related to this PR in the 2.0.x
https://github.com/apache/spark/pull/12770



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Ok.

On Sun, Dec 4, 2016 at 6:25 PM, Reynold Xin 
wrote:

> We have stopped making new releases for 1.5 so it makes no sense to
> backport.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-264754120>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-VjuhfvucqSwiSitncO_gIX_7G-wks5rE3YogaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
this bug is related to 1.5.x as well as 1.6.x.  please backport to 1.5.x as
well.

On Sun, Dec 4, 2016 at 6:20 PM, Reynold Xin 
wrote:

> If it is a bug fix and low risk, sure.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-264753604>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-fX25g3sjKumkkWJPXjq1Wq2jMqvks5rE3T8gaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2016-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
it is updated.

On Sun, Dec 4, 2016 at 11:23 AM, Xiao Li  wrote:

> @merlintang <https://github.com/merlintang> Could you please add
> [Branch-1.6] in your PR title?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-264724542>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-Q749lhH4-ePuwIlqR_-AjMhdlDIks5rExNAgaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL].Staging directory fail to be removed

2016-12-04 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
yes, exactly. This path is only for spark 1.x. what i proposed here is that
we need to use the code of spark 2.0.x o fix the bug of spark 1.x. you can
see this message from the my previous replies. I do not want to change the
code, since it will make the 1.x and 2.x in great different.

On Sun, Dec 4, 2016 at 10:08 AM, Xiao Li  wrote:

> *@gatorsmile* commented on this pull request.
> --
>
> In sql/hive/src/main/scala/org/apache/spark/sql/hive/
> execution/InsertIntoHiveTable.scala
> <https://github.com/apache/spark/pull/15819>:
>
> > +  } else {
> +inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
> +  }
> +val dir: Path =
> +  fs.makeQualified(
> +new Path(stagingPathName + "_" + executionId + "-" + 
TaskRunner.getTaskRunnerID))
> +logDebug("Created staging dir = " + dir + " for path = " + inputPath)
> +try {
> +  if (!FileUtils.mkdir(fs, dir, true, hadoopConf)) {
> +throw new IllegalStateException("Cannot create staging directory 
 '" + dir.toString + "'")
> +  }
> +  fs.deleteOnExit(dir)
> +}
> +catch {
> +  case e: IOException =>
> +throw new RuntimeException(
>
> Almost all the codes in this PR are copied from the existing master. This
> PR is just for branch 1.6
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819>, or mute the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-aaIs7Wx6ha3mvqrTVIxehcxGkaYks5rEwGKgaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL].Staging directory fail to be r...

2016-11-19 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r88778830
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
+return executionId
+  }
+
+  private def getStagingDir(inputPath: Path, hadoopConf: Configuration): 
Path = {
+val inputPathUri: URI = inputPath.toUri
+val inputPathName: String = inputPathUri.getPath
+val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
+val stagingPathName: String =
+  if (inputPathName.indexOf(stagingDir) == -1) {
+new Path(inputPathName, stagingDir).toString
+  } else {
+inputPathName.substring(0, inputPathName.indexOf(stagingDir) + 
stagingDir.length)
+  }
+val dir: Path =
+  fs.makeQualified(
+new Path(stagingPathName + "_" + executionId + "-" + 
TaskRunner.getTaskRunnerID))
+logDebug("Created staging dir = " + dir + " for path = " + inputPath)
+try {
+  if (!FileUtils.mkdir(fs, dir, true, hadoopConf)) {
+throw new IllegalStateException("Cannot create staging directory  
'" + dir.toString + "'")
+  }
+  fs.deleteOnExit(dir)
+}
+catch {
+  case e: IOException =>
+throw new RuntimeException(
--- End diff --

You can find the reason that we use this code is because (1) the old 
version need to use the hive package to create the staging directory, in the 
hive code, this staging directory is storied in a hash map, and then these 
staging directories would be removed when the session is closed. however, our 
spark code do not trigger the hive session close, then, these directories will 
not be removed. (2) you can find the pushed code just simulate the hive way to 
create the staging directory inside the spark rather than based on the hive. 
Then, the staging directory will be removed. (3) I will fix the return type 
issue, thanks for your comments @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL].Staging directory fail to be r...

2016-11-19 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r88778781
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
--- End diff --

yes, it is. I am working on this way because I want to code is exactly the 
same as the spark 2.0.x version. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL].Staging directory fail to be removed

2016-11-16 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
@cloud-fan @rxin can you review this code? since several customers are 
complaining about the hive generated empty staging files in the HDFS. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL].Staging directory fail to be r...

2016-11-16 Thread merlintang
Github user merlintang commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r88345264
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
+return executionId
--- End diff --

hi @fidato13 this is ok, since the part of this code is reused from spark 
2.0.2. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL].Staging directory fail to be removed

2016-11-09 Thread merlintang
Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Actually, I do not have the unit test, but the code list below (same as we
posted in the JIRA) can reproduce this bug.

The related code would be this way:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS T1 (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH
'../examples/src/main/resources/kv1.txt' INTO TABLE T1")
sqlContext.sql("CREATE TABLE IF NOT EXISTS T2 (key INT, value STRING)")
val sparktestdf = sqlContext.table("T1")
val dfw = sparktestdf.write
dfw.insertInto("T2")
val sparktestcopypydfdf = sqlContext.sql("""SELECT * from T2 """)
sparktestcopypydfdf.show

Our customer and ourself also have manually reproduced this bug for spark
1.6.x and 1.5.x.

For the unit test, because we do not know how to find the hive directory
for the related table in the test case, we can not check the computed
directory in the end.

The solution is that we reuse three functions in the 2.0.2 to create the
staging directory, then this bug is fixed.


On Wed, Nov 9, 2016 at 10:26 PM, Wenchen Fan 
wrote:

> do you have a unit test to reproduce this bug?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/15819#issuecomment-259611432>, or 
mute
> the thread
> 
<https://github.com/notifications/unsubscribe-auth/ABXY-YcT4gOF3RyXk0YhQTVZpHYVDSHRks5q8rj6gaJpZM4KtFSt>
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL].Staging directory fail to be r...

2016-11-08 Thread merlintang
GitHub user merlintang opened a pull request:

https://github.com/apache/spark/pull/15819

[SPARK-18372][SQL].Staging directory fail to be removed 

## What changes were proposed in this pull request?

This fix is related to be bug: 
https://issues.apache.org/jira/browse/SPARK-18372 . 
The insertIntoHiveTable would generate a .staging directory, but this 
directory  fail to be removed in the end. 

## How was this patch tested?
manual tests

Author: Mingjie Tang 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/merlintang/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15819


commit ac65375a64c2a8a2fe019dc0e2c031f413df74b8
Author: Mingjie Tang 
Date:   2016-11-09T00:41:32Z

SPARK-18372




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org