[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9417


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-154001929
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-04 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9417#discussion_r43869486
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -177,39 +170,24 @@ class PortableDataStream(
   }
 
   /**
-   * Create a new DataInputStream from the split and context
+   * Create a new DataInputStream from the split and context. The user of 
this method is responsible
+   * for closing the stream after usage.
*/
   def open(): DataInputStream = {
-if (!isOpen) {
-  val pathp = split.getPath(index)
-  val fs = pathp.getFileSystem(conf)
-  fileIn = fs.open(pathp)
-  isOpen = true
-}
-fileIn
+val pathp = split.getPath(index)
+val fs = pathp.getFileSystem(conf)
+fs.open(pathp)
   }
 
   /**
* Read the file as a byte array
*/
   def toArray(): Array[Byte] = {
-open()
-val innerBuffer = ByteStreams.toByteArray(fileIn)
-close()
-innerBuffer
-  }
-
-  /**
-   * Close the file (if it is currently open)
-   */
-  def close(): Unit = {
--- End diff --

My only last concern here is removing this method. It's `@Experimental` 
though it has been around a while. We could just keep the method as a no-op, in 
case some caller out there is calling it. Maybe deprecate it, and log a warning 
too. I'm not so fussed about the behavior change since it's again 
`@Experimental` and I think callers should have been closing the stream 
directly anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-04 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153696906
  
LGTM. Let me leave it open a bit for comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-04 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/9417#discussion_r43870851
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -177,39 +170,24 @@ class PortableDataStream(
   }
 
   /**
-   * Create a new DataInputStream from the split and context
+   * Create a new DataInputStream from the split and context. The user of 
this method is responsible
+   * for closing the stream after usage.
*/
   def open(): DataInputStream = {
-if (!isOpen) {
-  val pathp = split.getPath(index)
-  val fs = pathp.getFileSystem(conf)
-  fileIn = fs.open(pathp)
-  isOpen = true
-}
-fileIn
+val pathp = split.getPath(index)
+val fs = pathp.getFileSystem(conf)
+fs.open(pathp)
   }
 
   /**
* Read the file as a byte array
*/
   def toArray(): Array[Byte] = {
-open()
-val innerBuffer = ByteStreams.toByteArray(fileIn)
-close()
-innerBuffer
-  }
-
-  /**
-   * Close the file (if it is currently open)
-   */
-  def close(): Unit = {
--- End diff --

Added a deprecated ```close()``` method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-03 Thread kmader
Github user kmader commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153550148
  
@srowen @hvanhovell this is a nice improvement and more elegant than the 
original approach. 

As a side node, In our code base (which uses PortableDataStream quite a 
bit), these changes didn't break anything so since it passes tests as well, I'd 
say it's good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153059875
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/9417

[SPARK-11449][Core] PortableDataStream should be a factory

```PortableDataStream``` maintains some internal state. This makes it 
tricky to reuse a stream (one needs to call ```close``` on both the 
```PortableDataStream``` and the ```InputStream``` it produces).

This PR removes all state from ```PortableDataStream``` and effectively 
turns it into an ```InputStream```/```Array[Byte]``` factory. This makes the 
user responsible for managing the ```InputStream``` it returns.

cc @srowen 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-11449

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9417.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9417


commit 91b8c6cc86c83df161e176c7a4efbc3dd439d037
Author: Herman van Hovell 
Date:   2015-11-02T15:42:44Z

Removed state from PortableDataStream




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153059923
  
@kmader what do you think of this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153060623
  
**[Test build #1967 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1967/consoleFull)**
 for PR 9417 at commit 
[`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153095896
  
**[Test build #1968 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1968/consoleFull)**
 for PR 9417 at commit 
[`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9417#issuecomment-153133888
  
**[Test build #1968 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1968/consoleFull)**
 for PR 9417 at commit 
[`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`case class Corr(`\n  * `case class Corr(left: Expression, right: 
Expression)`\n  * `case class RepartitionByExpression(`\n  * `
logInfo(s\"Hive class not found $e\")`\n  * `logDebug(\"Hive class not 
found\", e)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org