[GitHub] spark pull request: Updated scripts for auditing releases
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/844 Updated scripts for auditing releases - Added script to automatically generate change list CHANGES.txt - Added test for verifying linking against maven distributions of `spark-sql` and `spark-hive` - Added SBT projects for testing functionality of `spark-sql` and `spark-hive` - Fixed issues in existing tests that might have come up because of changes in Spark 1.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark update-dev-scripts Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/844.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #844 commit e2e20b3fc4c5ca390e8c19f18f7a798c4b4b96c3 Author: Tathagata Das tathagata.das1...@gmail.com Date: 2014-05-21T07:02:15Z Updated tests for auditing releases. commit 25090ba86833a38726b0bf00474929bdf90e8ac4 Author: Tathagata Das tathagata.das1...@gmail.com Date: 2014-05-21T07:10:03Z Added missing license --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43720269 @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43720431 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43720416 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43723792 LGTM - thanks TD this is great! Having SQL and Hive modules in there is awesome. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43726777 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15114/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43726776 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Docs] Correct example of creating a new Spark...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/842#issuecomment-43726780 Thanks. I've merged this into master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1250] Fixed misleading comments in bin/...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/843 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43726813 Jenkins, test this again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1822] SchemaRDD.count() should use opti...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/841#issuecomment-43726915 He's on vacation this week so it might take a while for him to get back :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Minor] Move JdbcRDDSuite to the correct packa...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/839#issuecomment-43726958 Thanks. I've merged this into master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Docs] Correct example of creating a new Spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/842 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1880] [SQL] Eliminate unnecessary job e...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/825#issuecomment-43727197 We can easily add right outer join support to the hash join though. In general, the nested loop join performs very unfavorably compared with a hash join implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43727297 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43727320 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43727343 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43727532 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43727534 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15115/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43729433 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43729450 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43731406 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43731416 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43733545 `ensureFreeSpace` has 2 jobs. 1) iterate entries and select blocks to be dropped. 2) if to-be-dropped blocks can free enough space, mark them as dropping and return them to the caller. `ensureFreeSpace` is called within putLock, so each thread will see the dropping flag modification(I will discuss flag resetting in exception handling later) and thus get different to-be-dropped blocks. And block reading don't need the dropping flag so no conflict there. Let's consider block removing and exception handling(reset dropping flag) Job 1 of `ensureFreeSpace`(selecting) and removing are both synchronized by `entries`, so they must process by turn. If a block is removed first, then everything is OK. If a block is removed after Job 2 of `ensureFreeSpace`(marking) which is also synchronized by `entries`(in my modification), then the block will be dropped into disk and managed by diskStore, which I think is OK. If a block is removed between selecting and marking, the marking will check if entry is null, so it's OK, too. About exception handling, flag resetting is also synchronized by `entries`, so it won't process during selecting and marking. If resetting happened before selecting, then selecting will be able to select these blocks and re-drop them. If resetting happened after selecting, which means the selected to-be-dropped blocks won't include the resetted blocks, so there is no conflict. Actually there are 3 place that write or read the dropping flag(selecting, marking and resetting) and they are all synchronized by `entries`, so I think we don't need to define the flag as volatile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43734386 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43734389 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15117/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1820][tools] Make GenerateMimaIgnore @D...
GitHub user nikhils05 opened a pull request: https://github.com/apache/spark/pull/845 [SPARK-1820][tools] Make GenerateMimaIgnore @DeveloperApi annotation aware. Solution for : Add all the classes with DeveloperApi annotation in Mima excludes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nikhils05/spark tools Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/845.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #845 commit f495620d0f11e34dee67853dd0912fe20602d24d Author: nikhil7sh nikhilsharmalnm...@gmail.ccom Date: 2014-05-19T06:04:02Z (SPARK-1820) Make GenerateMimaIgnore @DeveloperApi annotation aware commit 6a7201b3bdbf917ea0054049eeaded13bfcbfd72 Author: nikhil7sh nikhilsharmalnm...@gmail.ccom Date: 2014-05-21T09:16:10Z [SPARK-1820] Make GenerateMimaIgnore @DeveloperApi annotation aware commit 8fa02d2c67f16556d7477f515603d472d2679a21 Author: nikhil7sh nikhilsharmalnm...@gmail.ccom Date: 2014-05-21T09:20:04Z [SPARK-1820] Make GenerateMimaIgnore @DeveloperApi annotation aware --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1820][tools] Make GenerateMimaIgnore @D...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/845#issuecomment-43734997 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/791#discussion_r12888619 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -243,10 +250,13 @@ private class MemoryStore(blockManager: BlockManager, maxMemory: Long) val iterator = entries.entrySet().iterator() while (maxMemory - (currentMemory - selectedMemory) space iterator.hasNext) { val pair = iterator.next() - val blockId = pair.getKey - if (rddToAdd.isEmpty || rddToAdd != getRddId(blockId)) { -selectedBlocks += blockId -selectedMemory += pair.getValue.size + val entry = pair.getValue + if (!entry.dropping) { +val blockId = pair.getKey +if (rddToAdd.isEmpty || rddToAdd != getRddId(blockId)) { + selectedBlocks += blockId + selectedMemory += entry.size --- End diff -- As mentioned in comments, I misread the variable - this is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43736593 - With the latest commit, the issue with dropping flag is gone - which is great. - There is a change of behavior w.r.t earlier code. Whether the earlier code was the way it was intentionally or accidentally, I am not sure - will let @mateiz or others comment. Essentially there are a few things here : a) What happens if existing block is re-added. Looks like this was probably handled earlier also ? I went up the call tree a bit, and did not look like this was prevented : but maybe I missed it. Any comments @mateiz ? b) What happens if same block is added in parallel by two threads. If this was supported usecase, then the current PR breaks this - it is possible for first thread to add it, and second to evict it from memory in case it was not possible to host both two copies in memory (according to the free space computed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43737610 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43737611 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15116/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43738827 @mridulm I checked the code of BlockManager#doPut. val putBlockInfo = { val tinfo = new BlockInfo(level, tellMaster) // Do atomically ! val oldBlockOpt = blockInfo.putIfAbsent(blockId, tinfo) if (oldBlockOpt.isDefined) { if (oldBlockOpt.get.waitForReady()) { logWarning(Block + blockId + already exists on this machine; not re-adding it) return updatedBlocks } // TODO: So the block info exists - but previous attempt to load it (?) failed. // What do we do now ? Retry on it ? oldBlockOpt.get } else { tinfo } } BlockManger will create a BlockInfo for the block to be added, and `val oldBlockOpt = blockInfo.putIfAbsent(blockId, tinfo)`, so if multi threads are adding same block, one thread will put the BlockInfo successfully and the other will fail and stop to put. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43739167 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1880] [SQL] Eliminate unnecessary job e...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/825#issuecomment-43741634 @rxin Ah, you mean that we should add right/full outer join support in addition to #734? I agree with the unfavorable performance of the nested loop join, so we should wait for being merged and then add the right/full outer join support at another issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1880] [SQL] Eliminate unnecessary job e...
Github user ueshin commented on the pull request: https://github.com/apache/spark/pull/825#issuecomment-43742049 @rxin BTW, speaking of performance, could you please review the code #836? I think this is a kind of blocker issue of join strategy. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43742247 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43742248 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15118/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43744004 This seems really promising!! However, can you explain whether the following sequence of events is possible or not in `ensureFreeSpace`? Both thread 1 and thread 2 wants to insert blocks of 100 bytes. Existing blocks include block A and block B of 100 bytes each, and the total capacity is 200 bytes. Next, - Thread 1 selects block A (not marked yet) and exits the `entries.synchronized { // select }` - Thread 2 selects block A as well (not marked yet) and exist `entries.synchronized { // select }` - Thread 1 enters `entries.synchronized { // mark }` and marks block A to be dropped - Thread 2 also enters `entries.synchronized { // mark } ` and marks block A to be dropped again (this seems to be possible since there is no double check to see whether each block has already been marked or not) - Thread 1 then drops Block A to disk - Thread 2 tries to drop Block A to disk as well, but since it is already dropped, no more action is taken. - Both threads think that 100 bytes have been cleared. Hence 2 x 100 bytes are inserted after dropping only 100 bytes. Is this sequence possible? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43744225 @cloud-fan there are multiple calls to memoryStore to directly put a block - not just from external addition. So looking at only doPut might not help ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43744304 @tdas there is a dropping flag which prevents this. Or did I misunderstand your query ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43744657 @tdas yes - thread 1 should set A's dropping to true; so thread 2 should not select it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43744797 Is that so? Since selection and marking are occurring into different `entries.synchronized` blocks, selection and marking are not atomic together. So two threads can select the same block, before marking that block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Updated scripts for auditing releases
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/844#issuecomment-43744964 Jenkins, retest this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43749375 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43749356 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43751967 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43751952 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43752165 @tdas you missed an important thing. `trToPut` call `ensureFreeSpace` within the putLock, so one thread have to wait another thread done both selecting and marking. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43753931 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15119/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43753928 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43756438 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1776] Have Spark's SBT build read depen...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/772#issuecomment-43756441 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15120/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43756945 @tdas as @cloud-fan stated, the code uses the implementation detail that the private method is always called within context of a tryToPut lock - and not called by anyone else. I dont like the fact that we have locking state spread out like this, but then this is how it was already I guess ... Maybe we should at best annotate the method ? And possibly assert that it is within tryToPut lock ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43775013 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43775032 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43778259 @tdas @mridulm what about we moving the `putLock.synchronized` into `ensureFreeSpace ` and let `tryToPut` call `ensureFreeSpace ` directly? I think it will be more clear this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user mridulm commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43783643 @cloud-fan makes more sense. Also, please rename it to something more appropriate (since it is not longer trying to put within that block !) @tdas, can you also comment about the usecases/flows I mentioned above ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1822] SchemaRDD.count() should use opti...
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/841#issuecomment-43786603 @rxin thanks for the heads up. I appreciate help from anyone to help burn down my open PRs, the oldest being over a month old. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43788086 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15121/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: add support for left semi join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/837#issuecomment-43788084 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP]Improve ALS algorithm resource usage
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/828#issuecomment-43790944 @mateiz, @mengxr I am using [the code](https://github.com/witgo/spark/compare/cachePoint) to test ALS. A brief description of the test: | Item | Description | | - | --- | |cluster |`3 servers`,`36 core cpus`,`2.5T HDD`,`120G memory`| |data| `700 million`| |code|`val model = ALS.trainImplicit(ratings, 25, 30, 0.065, -1, 40.0)`| |time|`12.5 h`| |shuffle write| `4.72T`| |largest local dir|`200G`| --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1888] enhance MEMORY_AND_DISK mode by d...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/791#issuecomment-43795376 @cloud-fan @mridulm Aaah, I get it now. I knew I was missing something! I agree with @mridulm that this is a tricky lock structure and needs to be cleaner. Putting the `putLock.synchronized` inside the `ensureFreeSpace` is definitely better, as it co-locates the important locks together in the code. Less likely to be missed as I did. Maybe rename to `ensureFreeSpaceLock`? Or how about synchronizing on `this` ( that is, `def ensureFreeSpace(...): ReturnType = synchronized { ... } ` ? Also, please add a few more lines in the scaladoc of the `ensureFreeSpace` explaining this lock structure and the high-level selection and marking steps. You could give the higher level flow (select, mark, drop, exception handling) in the scala doc of `MemoryStore`, On a related note, have you run any long, rigorous test on this to make sure that (1) this new lock structure is not accidentally causing deadlocks (has happened before and was found only by running a long test)? (2) the memory limit is maintained all the time (to catch any race condition like i suggested even if remotely possible)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/846 [SPARK-1896] Respect spark.master before MASTER in REPL The hierarchy for the shell is as follows: ``` MASTER --master spark.master (spark-defaults.conf) ``` This is inconsistent with the way we run normal applications, which is: ``` --master spark.master (spark-defaults.conf) MASTER ``` I was trying to run a shell locally on a standalone cluster launched through the ec2 scripts, which automatically set `MASTER` in spark-env.sh. It was surprising to me that `--master` didn't take effect. You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark shell-master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/846.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #846 commit 2cb81c9ed313976e23ae169c20bc930efc259756 Author: Andrew Or andrewo...@gmail.com Date: 2014-05-21T18:43:36Z Respect spark.master before MASTER in REPL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/846#issuecomment-43798409 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/846#issuecomment-43798432 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/847 [Typo] Stoped - Stopped You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark yarn-typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #847 commit c1906afd0b8946bf308b388f1928779e50d4fa5b Author: Andrew Or andrewo...@gmail.com Date: 2014-05-21T18:50:44Z Stoped - Stopped --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43798939 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43798959 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/847#issuecomment-43798932 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/847#issuecomment-43798958 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user ahirreddy commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43799163 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43799234 Thanks. I will merge once Travis returns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/848 [SPARK-1870] Make spark-submit --jars work in yarn-cluster mode. Sent secondary jars to distributed cache of all containers and add the cached jars to classpath before executors start. `spark-submit --jars` also works in standalone server and `yarn-client`. Thanks for @andrewor14 for testing! I removed Doesn't work for drivers in standalone mode with cluster deploy mode. from `spark-submit`'s help message, though we haven't tested mesos yet. CC: @dbtsai @sryza You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark yarn-classpath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #848 commit dc3c825934cbd62566d09d3f2b4334dcc444879a Author: Xiangrui Meng m...@databricks.com Date: 2014-05-21T17:51:43Z add secondary jars to classpath in yarn commit 3e7e1c4a2fe1a9d8512c19e56df91b34bea58108 Author: Xiangrui Meng m...@databricks.com Date: 2014-05-21T18:21:09Z use sparkConf instead of hadoop conf commit 11e535434940d0809bd8c1380b2d4a92d87ebb6a Author: Xiangrui Meng m...@databricks.com Date: 2014-05-21T18:45:25Z minor changes commit 65e04ad8296969445e4ecfaa8921d55fe1e39c74 Author: Xiangrui Meng m...@databricks.com Date: 2014-05-21T18:52:02Z update spark-submit help message and add a comment for yarn-client --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/847#issuecomment-43799605 Thanks. I've merged this into master branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43800111 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1822] SchemaRDD.count() should use opti...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/841#discussion_r12916889 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -274,6 +274,10 @@ class SchemaRDD( seed: Long) = new SchemaRDD(sqlContext, Sample(fraction, withReplacement, seed, logicalPlan)) + override def count(): Long = { --- End diff -- Do you mind adding javadoc for this? Just explain different from RDD count's, SchemaRDD count actually invokes the optimizer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43803035 @ankurdave is this good now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43803060 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/846#issuecomment-43803106 Merged build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1896] Respect spark.master before MASTE...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/846#issuecomment-43803107 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15122/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43803413 Build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43803425 Build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/847#issuecomment-43803493 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [Typo] Stoped - Stopped
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/847#issuecomment-43803495 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15123/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43803492 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43804545 Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15126/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43804541 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15125/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43804540 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Enable repartitioning of graph over different ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/719#issuecomment-43804544 Build finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/697 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1519] Support minPartitions param of wh...
Github user kanzhang commented on the pull request: https://github.com/apache/spark/pull/697#issuecomment-43810385 @rxin @ahirreddy , thanks for the quick response! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1822] SchemaRDD.count() should use opti...
Github user kanzhang commented on a diff in the pull request: https://github.com/apache/spark/pull/841#discussion_r12921105 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala --- @@ -274,6 +274,10 @@ class SchemaRDD( seed: Long) = new SchemaRDD(sqlContext, Sample(fraction, withReplacement, seed, logicalPlan)) + override def count(): Long = { --- End diff -- Sure, will do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/848#discussion_r12921552 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -479,37 +485,24 @@ object ClientBase { extraClassPath.foreach(addClasspathEntry) -addClasspathEntry(Environment.PWD.$()) +val cachedSecondaryJarLinks = + sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse().split(,) // Normally the users app.jar is last in case conflicts with spark jars if (sparkConf.get(spark.yarn.user.classpath.first, false).toBoolean) { --- End diff -- What's difference between `spark.yarn.user.classpath.first` and `spark.files.userClassPathFirst `? For me, it seems to be the same thing with two different configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/848#discussion_r12921709 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -479,37 +485,24 @@ object ClientBase { extraClassPath.foreach(addClasspathEntry) -addClasspathEntry(Environment.PWD.$()) +val cachedSecondaryJarLinks = + sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse().split(,) // Normally the users app.jar is last in case conflicts with spark jars if (sparkConf.get(spark.yarn.user.classpath.first, false).toBoolean) { --- End diff -- PS, in line 47, * 1. In standalone mode, it will launch an [[org.apache.spark.deploy.yarn.ApplicationMaster]] should it be cluster mode now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43812877 Thanks. It looks great for me, and better than my patch. cachedSecondaryJarLinks.foreach(addPwdClasspathEntry) is not needed since we have addPwdClasspathEntry(*). But later, we may change the priority of the jars since we explicitly add them. This patch also works for me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43814337 The symbolic links may not be under the PWD. That is why it didn't work before. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43814642 It works under driver before, so the major issue is those files are not in executor's distributed cache. But I like the idea to add them explicitly so we'll not miss anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/848#issuecomment-43815204 Yes, we can also control the ordering in this way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1870] Make spark-submit --jars work in ...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/848#discussion_r12923791 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -479,37 +485,24 @@ object ClientBase { extraClassPath.foreach(addClasspathEntry) -addClasspathEntry(Environment.PWD.$()) +val cachedSecondaryJarLinks = + sparkConf.getOption(CONF_SPARK_YARN_SECONDARY_JARS).getOrElse().split(,) // Normally the users app.jar is last in case conflicts with spark jars if (sparkConf.get(spark.yarn.user.classpath.first, false).toBoolean) { --- End diff -- `spark.files.userClassPath` is a global configuration that controls the ordering of dynamically added jars, while `spark.yarn.user.classpath.first` is only for YARN. I agree it is a little confusing, but this is independent of this PR. We can create a new JIRA for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---