[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9417 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-154001929 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/9417#discussion_r43869486 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -177,39 +170,24 @@ class PortableDataStream( } /** - * Create a new DataInputStream from the split and context + * Create a new DataInputStream from the split and context. The user of this method is responsible + * for closing the stream after usage. */ def open(): DataInputStream = { -if (!isOpen) { - val pathp = split.getPath(index) - val fs = pathp.getFileSystem(conf) - fileIn = fs.open(pathp) - isOpen = true -} -fileIn +val pathp = split.getPath(index) +val fs = pathp.getFileSystem(conf) +fs.open(pathp) } /** * Read the file as a byte array */ def toArray(): Array[Byte] = { -open() -val innerBuffer = ByteStreams.toByteArray(fileIn) -close() -innerBuffer - } - - /** - * Close the file (if it is currently open) - */ - def close(): Unit = { --- End diff -- My only last concern here is removing this method. It's `@Experimental` though it has been around a while. We could just keep the method as a no-op, in case some caller out there is calling it. Maybe deprecate it, and log a warning too. I'm not so fussed about the behavior change since it's again `@Experimental` and I think callers should have been closing the stream directly anyway. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153696906 LGTM. Let me leave it open a bit for comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/9417#discussion_r43870851 --- Diff: core/src/main/scala/org/apache/spark/input/PortableDataStream.scala --- @@ -177,39 +170,24 @@ class PortableDataStream( } /** - * Create a new DataInputStream from the split and context + * Create a new DataInputStream from the split and context. The user of this method is responsible + * for closing the stream after usage. */ def open(): DataInputStream = { -if (!isOpen) { - val pathp = split.getPath(index) - val fs = pathp.getFileSystem(conf) - fileIn = fs.open(pathp) - isOpen = true -} -fileIn +val pathp = split.getPath(index) +val fs = pathp.getFileSystem(conf) +fs.open(pathp) } /** * Read the file as a byte array */ def toArray(): Array[Byte] = { -open() -val innerBuffer = ByteStreams.toByteArray(fileIn) -close() -innerBuffer - } - - /** - * Close the file (if it is currently open) - */ - def close(): Unit = { --- End diff -- Added a deprecated ```close()``` method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user kmader commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153550148 @srowen @hvanhovell this is a nice improvement and more elegant than the original approach. As a side node, In our code base (which uses PortableDataStream quite a bit), these changes didn't break anything so since it passes tests as well, I'd say it's good to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153059875 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
GitHub user hvanhovell opened a pull request: https://github.com/apache/spark/pull/9417 [SPARK-11449][Core] PortableDataStream should be a factory ```PortableDataStream``` maintains some internal state. This makes it tricky to reuse a stream (one needs to call ```close``` on both the ```PortableDataStream``` and the ```InputStream``` it produces). This PR removes all state from ```PortableDataStream``` and effectively turns it into an ```InputStream```/```Array[Byte]``` factory. This makes the user responsible for managing the ```InputStream``` it returns. cc @srowen You can merge this pull request into a Git repository by running: $ git pull https://github.com/hvanhovell/spark SPARK-11449 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/9417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #9417 commit 91b8c6cc86c83df161e176c7a4efbc3dd439d037 Author: Herman van HovellDate: 2015-11-02T15:42:44Z Removed state from PortableDataStream --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153059923 @kmader what do you think of this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153060623 **[Test build #1967 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1967/consoleFull)** for PR 9417 at commit [`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153095896 **[Test build #1968 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1968/consoleFull)** for PR 9417 at commit [`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11449][Core] PortableDataStream should ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9417#issuecomment-153133888 **[Test build #1968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1968/consoleFull)** for PR 9417 at commit [`91b8c6c`](https://github.com/apache/spark/commit/91b8c6cc86c83df161e176c7a4efbc3dd439d037). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `case class Corr(`\n * `case class Corr(left: Expression, right: Expression)`\n * `case class RepartitionByExpression(`\n * ` logInfo(s\"Hive class not found $e\")`\n * `logDebug(\"Hive class not found\", e)`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org