[jira] [Commented] (SPARK-5393) Flood of util.RackResolver log messages after SPARK-1714

2015-03-16 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363423#comment-14363423 ] Sandy Ryza commented on SPARK-5393: --- Hi [~djp], there's no special reason we didn't fix

[jira] [Commented] (SPARK-4921) TaskSetManager mistakenly returns PROCESS_LOCAL for NO_PREF tasks

2015-03-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355366#comment-14355366 ] Sandy Ryza commented on SPARK-4921: --- I'm going to close this as Won't Fix as this has

[jira] [Updated] (SPARK-6300) sc.addFile(path) does not support the relative path.

2015-03-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-6300: -- Priority: Critical (was: Major) Target Version/s: 1.3.1 Affects Version/s: 1.3.0

[jira] [Updated] (SPARK-6300) sc.addFile(path) does not support the relative path.

2015-03-12 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-6300: -- Priority: Critical (was: Minor) Target Version/s: 1.3.1 sc.addFile(path) does not support

[jira] [Resolved] (SPARK-4921) TaskSetManager mistakenly returns PROCESS_LOCAL for NO_PREF tasks

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-4921. --- Resolution: Won't Fix TaskSetManager mistakenly returns PROCESS_LOCAL for NO_PREF tasks

[jira] [Resolved] (SPARK-1956) Enable shuffle consolidation by default

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-1956. --- Resolution: Won't Fix Enable shuffle consolidation by default

[jira] [Resolved] (SPARK-2114) groupByKey and joins on raw data

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2114. --- Resolution: Duplicate groupByKey and joins on raw data

[jira] [Commented] (SPARK-2819) Difficult to turn on intercept with linear models

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355398#comment-14355398 ] Sandy Ryza commented on SPARK-2819: --- With the pipelines API superceding this, I think we

[jira] [Resolved] (SPARK-2819) Difficult to turn on intercept with linear models

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-2819. --- Resolution: Invalid Difficult to turn on intercept with linear models

[jira] [Resolved] (SPARK-4456) Document why spilling depends on both elements read and memory used

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-4456. --- Resolution: Invalid Closing this in light of changes to when spilling occurs. Document why spilling

[jira] [Comment Edited] (SPARK-2819) Difficult to turn on intercept with linear models

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355398#comment-14355398 ] Sandy Ryza edited comment on SPARK-2819 at 3/10/15 6:23 PM:

[jira] [Commented] (SPARK-1956) Enable shuffle consolidation by default

2015-03-10 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355390#comment-14355390 ] Sandy Ryza commented on SPARK-1956: --- Closing this as Won't Fix now that we've moved

[jira] [Commented] (SPARK-4911) Report the inputs and outputs of Spark jobs so that external systems can track data lineage

2015-03-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353554#comment-14353554 ] Sandy Ryza commented on SPARK-4911: --- I know that [~malaskat] has played around with a

[jira] [Commented] (SPARK-5490) KMeans costs can be incorrect if tasks need to be rerun

2015-02-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334039#comment-14334039 ] Sandy Ryza commented on SPARK-5490: --- The relevant JIRA is SPARK-732, but it's marked as

[jira] [Commented] (SPARK-5906) Input read size incorrect for Parquet files

2015-02-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327007#comment-14327007 ] Sandy Ryza commented on SPARK-5906: --- That stack trace doesn't necessarily indicate to me

[jira] [Commented] (SPARK-5906) Input read size incorrect for Parquet files

2015-02-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14326992#comment-14326992 ] Sandy Ryza commented on SPARK-5906: --- Hmm, that's definitely not the expected behavior.

[jira] [Commented] (SPARK-5736) Add executor log url to Executors page on Yarn

2015-02-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14316949#comment-14316949 ] Sandy Ryza commented on SPARK-5736: --- Is this the same as SPARK-2450? Add executor log

[jira] [Comment Edited] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310130#comment-14310130 ] Sandy Ryza edited comment on SPARK-4550 at 2/8/15 9:07 PM: --- I

[jira] [Commented] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc

2015-02-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311765#comment-14311765 ] Sandy Ryza commented on SPARK-4617: --- This got fixed by SPARK-3779. Closing now. Fix

[jira] [Resolved] (SPARK-4617) Fix spark.yarn.applicationMaster.waitTries doc

2015-02-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-4617. --- Resolution: Not a Problem Fix spark.yarn.applicationMaster.waitTries doc

[jira] [Comment Edited] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310130#comment-14310130 ] Sandy Ryza edited comment on SPARK-4550 at 2/6/15 11:13 PM: I

[jira] [Comment Edited] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310130#comment-14310130 ] Sandy Ryza edited comment on SPARK-4550 at 2/6/15 11:13 PM: I

[jira] [Comment Edited] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310130#comment-14310130 ] Sandy Ryza edited comment on SPARK-4550 at 2/6/15 11:08 PM: I

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-06 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310130#comment-14310130 ] Sandy Ryza commented on SPARK-4550: --- I got a working prototype and benchmarked the

[jira] [Updated] (SPARK-5645) Track local bytes read for shuffles - update UI

2015-02-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5645: -- Assignee: Kostas Sakellis Track local bytes read for shuffles - update UI

[jira] [Updated] (SPARK-5646) Record output metrics for cache

2015-02-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5646: -- Assignee: Kostas Sakellis Record output metrics for cache ---

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304864#comment-14304864 ] Sandy Ryza commented on SPARK-4550: --- I had heard rumors to that effect, so I ran some

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304816#comment-14304816 ] Sandy Ryza commented on SPARK-4550: --- WIP branch:

[jira] [Assigned] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4550: - Assignee: Sandy Ryza In sort-based shuffle, store map outputs in serialized form

[jira] [Commented] (SPARK-5529) Executor is still hold while BlockManager has been removed

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305706#comment-14305706 ] Sandy Ryza commented on SPARK-5529: --- [~shenhong] [~lianhuiwang] both of these patches

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306321#comment-14306321 ] Sandy Ryza commented on SPARK-4550: --- I also just tried this out using an object that's

[jira] [Comment Edited] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304864#comment-14304864 ] Sandy Ryza edited comment on SPARK-4550 at 2/5/15 12:36 AM: I

[jira] [Created] (SPARK-5581) When writing sorted map output file, avoid open / close between each partition

2015-02-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5581: - Summary: When writing sorted map output file, avoid open / close between each partition Key: SPARK-5581 URL: https://issues.apache.org/jira/browse/SPARK-5581 Project:

[jira] [Updated] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4550: -- Attachment: SPARK-4550-design-v1.pdf In sort-based shuffle, store map outputs in serialized form

[jira] [Commented] (SPARK-4550) In sort-based shuffle, store map outputs in serialized form

2015-02-02 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14302303#comment-14302303 ] Sandy Ryza commented on SPARK-4550: --- Just posted a design doc. Would love to get

[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-02-01 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14300814#comment-14300814 ] Sandy Ryza commented on SPARK-5492: --- After seeing this I tried with 1.0.4 and didn't hit

[jira] [Created] (SPARK-5500) Document that feeding hadoopFile into a shuffle operation will cause problems

2015-01-30 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5500: - Summary: Document that feeding hadoopFile into a shuffle operation will cause problems Key: SPARK-5500 URL: https://issues.apache.org/jira/browse/SPARK-5500 Project: Spark

[jira] [Updated] (SPARK-5151) Parquet Predicate Pushdown Does Not Work with Nested Structures.

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5151: -- Component/s: SQL Parquet Predicate Pushdown Does Not Work with Nested Structures.

[jira] [Updated] (SPARK-5151) Parquet Predicate Pushdown Does Not Work with Nested Structures.

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5151: -- Component/s: (was: Spark Core) Parquet Predicate Pushdown Does Not Work with Nested Structures.

[jira] [Reopened] (SPARK-603) add simple Counter API

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reopened SPARK-603: -- add simple Counter API -- Key: SPARK-603 URL:

[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298138#comment-14298138 ] Sandy Ryza commented on SPARK-5492: --- Very weird. I'll look into it. Did that come up

[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14298190#comment-14298190 ] Sandy Ryza commented on SPARK-5492: --- Are you able to provide any more detail on the

[jira] [Assigned] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-5492: - Assignee: Sandy Ryza Thread statistics can break with older Hadoop versions

[jira] [Created] (SPARK-5458) Refer to aggregateByKey instead of combineByKey in docs

2015-01-28 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5458: - Summary: Refer to aggregateByKey instead of combineByKey in docs Key: SPARK-5458 URL: https://issues.apache.org/jira/browse/SPARK-5458 Project: Spark Issue Type:

[jira] [Updated] (SPARK-5458) Refer to aggregateByKey instead of combineByKey in docs

2015-01-28 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5458: -- Priority: Trivial (was: Minor) Refer to aggregateByKey instead of combineByKey in docs

[jira] [Commented] (SPARK-5097) Adding data frame APIs to SchemaRDD

2015-01-27 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14294193#comment-14294193 ] Sandy Ryza commented on SPARK-5097: --- Ah, yeah, I hadn't considered that aspect. I

[jira] [Commented] (SPARK-5097) Adding data frame APIs to SchemaRDD

2015-01-27 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293849#comment-14293849 ] Sandy Ryza commented on SPARK-5097: --- Would it be possible to keep the Python versions of

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14291184#comment-14291184 ] Sandy Ryza commented on SPARK-2688: --- [~xuefuz] Spark already has transformations that

[jira] [Resolved] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2015-01-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-3622. --- Resolution: Not a Problem Provide a custom transformation that can output multiple RDDs

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2015-01-24 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290885#comment-14290885 ] Sandy Ryza commented on SPARK-4452: --- I think there's more to this one, the subtasks

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289813#comment-14289813 ] Sandy Ryza commented on SPARK-2688: --- I agree that this is worth keeping open. Allowing

[jira] [Updated] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-2688: -- Issue Type: New Feature (was: Improvement) Need a way to run multiple data pipeline concurrently

[jira] [Reopened] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2015-01-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reopened SPARK-2688: --- Need a way to run multiple data pipeline concurrently

[jira] [Created] (SPARK-5393) Flood of util.RackResolver log messages after SPARK-1714

2015-01-23 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5393: - Summary: Flood of util.RackResolver log messages after SPARK-1714 Key: SPARK-5393 URL: https://issues.apache.org/jira/browse/SPARK-5393 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4136) Under dynamic allocation, cancel outstanding executor requests when no longer needed

2015-01-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4136: -- Summary: Under dynamic allocation, cancel outstanding executor requests when no longer needed (was:

[jira] [Commented] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler

2015-01-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287753#comment-14287753 ] Sandy Ryza commented on SPARK-1714: --- Oops yeah, my bad. Filed SPARK-5370 and posted a

[jira] [Created] (SPARK-5370) Remove some unnecessary synchronization in YarnAllocator

2015-01-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5370: - Summary: Remove some unnecessary synchronization in YarnAllocator Key: SPARK-5370 URL: https://issues.apache.org/jira/browse/SPARK-5370 Project: Spark Issue Type:

[jira] [Commented] (SPARK-5347) InputMetrics bug when inputSplit is not instanceOf FileSplit

2015-01-21 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286729#comment-14286729 ] Sandy Ryza commented on SPARK-5347: --- Hi [~shenhong], I think this may be a duplicate of

[jira] [Commented] (SPARK-4630) Dynamically determine optimal number of partitions

2015-01-19 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283225#comment-14283225 ] Sandy Ryza commented on SPARK-4630: --- One way I was thinking it might make sense to

[jira] [Commented] (SPARK-4630) Dynamically determine optimal number of partitions

2015-01-19 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14283146#comment-14283146 ] Sandy Ryza commented on SPARK-4630: --- [~rxin] I agree that there are probably a ton of

[jira] [Updated] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-01-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-3655: -- Priority: Major (was: Minor) Support sorting of values in addition to keys (i.e. secondary sort)

[jira] [Updated] (SPARK-4924) Factor out code to launch Spark applications into a separate library

2015-01-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4924: -- Assignee: Marcelo Vanzin Factor out code to launch Spark applications into a separate library

[jira] [Created] (SPARK-5199) Input metrics should show up for InputFormats that return CombineFileSplits

2015-01-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5199: - Summary: Input metrics should show up for InputFormats that return CombineFileSplits Key: SPARK-5199 URL: https://issues.apache.org/jira/browse/SPARK-5199 Project: Spark

[jira] [Commented] (SPARK-2621) Update task InputMetrics incrementally

2015-01-11 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273192#comment-14273192 ] Sandy Ryza commented on SPARK-2621: --- Definitely - just filed SPARK-5199 for this.

[jira] [Commented] (SPARK-4159) Maven build doesn't run JUnit test suites

2015-01-07 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268673#comment-14268673 ] Sandy Ryza commented on SPARK-4159: --- [~pwendell] [~srowen] After this change it looks

[jira] [Created] (SPARK-5112) Expose SizeEstimator as a developer API

2015-01-06 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5112: - Summary: Expose SizeEstimator as a developer API Key: SPARK-5112 URL: https://issues.apache.org/jira/browse/SPARK-5112 Project: Spark Issue Type: Improvement

[jira] [Updated] (SPARK-5087) Merge yarn.Client and yarn.ClientBase

2015-01-04 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-5087: -- Summary: Merge yarn.Client and yarn.ClientBase (was: Consolidate yarn.Client and yarn.ClientBase)

[jira] [Created] (SPARK-5087) Consolidate yarn.Client and yarn.ClientBase

2015-01-04 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5087: - Summary: Consolidate yarn.Client and yarn.ClientBase Key: SPARK-5087 URL: https://issues.apache.org/jira/browse/SPARK-5087 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-30 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261384#comment-14261384 ] Sandy Ryza commented on SPARK-4921: --- Ah, makes sense. In the query, are some splits

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260322#comment-14260322 ] Sandy Ryza commented on SPARK-4921: --- Offline [~xuefuz] and [~lirui] mentioned to me that

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-29 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260325#comment-14260325 ] Sandy Ryza commented on SPARK-4921: --- [~xuefuz] [~lirui] was that query against data not

[jira] [Updated] (SPARK-4585) Spark dynamic executor allocation shouldn't use maxExecutors as initial number

2014-12-27 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4585: -- Summary: Spark dynamic executor allocation shouldn't use maxExecutors as initial number (was: Spark

[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks

2014-12-23 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257940#comment-14257940 ] Sandy Ryza commented on SPARK-4921: --- Is there a barebones Spark program that I could use

[jira] [Updated] (SPARK-1714) Take advantage of AMRMClient APIs to simplify logic in YarnAllocationHandler

2014-12-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-1714: -- Target Version/s: 1.3.0 Affects Version/s: 1.2.0 Fix Version/s: (was: 1.2.0) Take

[jira] [Created] (SPARK-4911) Report the inputs and outputs of Spark jobs so that external systems can track data lineage

2014-12-20 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4911: - Summary: Report the inputs and outputs of Spark jobs so that external systems can track data lineage Key: SPARK-4911 URL: https://issues.apache.org/jira/browse/SPARK-4911

[jira] [Created] (SPARK-4885) Enable fetched blocks to exceed 2 GB by chaining buffers

2014-12-18 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4885: - Summary: Enable fetched blocks to exceed 2 GB by chaining buffers Key: SPARK-4885 URL: https://issues.apache.org/jira/browse/SPARK-4885 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4885) Enable fetched blocks to exceed 2 GB

2014-12-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4885: -- Summary: Enable fetched blocks to exceed 2 GB (was: Enable fetched blocks to exceed 2 GB by chaining

[jira] [Commented] (SPARK-4885) Enable fetched blocks to exceed 2 GB

2014-12-18 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252962#comment-14252962 ] Sandy Ryza commented on SPARK-4885: --- One approach would be to deserialize the stream as

[jira] [Updated] (SPARK-4874) Report number of records read/written in a task

2014-12-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4874: -- Assignee: Kostas Sakellis Report number of records read/written in a task

[jira] [Updated] (SPARK-4843) Squash ExecutorRunnable and ExecutorRunnableUtil hierarchy in yarn module

2014-12-14 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4843: -- Assignee: Kostas Sakellis Squash ExecutorRunnable and ExecutorRunnableUtil hierarchy in yarn module

[jira] [Commented] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-12-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239927#comment-14239927 ] Sandy Ryza commented on SPARK-4447: --- Hey [~andrewor14], yeah, actually have a patch I've

[jira] [Assigned] (SPARK-3779) yarn spark.yarn.applicationMaster.waitTries config should be changed to a time period

2014-12-09 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-3779: - Assignee: Sandy Ryza yarn spark.yarn.applicationMaster.waitTries config should be changed to a

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-08 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14238628#comment-14238628 ] Sandy Ryza commented on SPARK-3655: --- The groupBy Iterable vs. TraversableOnce

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235758#comment-14235758 ] Sandy Ryza commented on SPARK-3655: --- Hey [~koert], I think the transform that would most

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14235795#comment-14235795 ] Sandy Ryza commented on SPARK-3655: --- The repartitionAndSortWithinPartitions approach

[jira] [Created] (SPARK-4770) spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN

2014-12-05 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4770: - Summary: spark.scheduler.minRegisteredResourcesRatio documented default is incorrect for YARN Key: SPARK-4770 URL: https://issues.apache.org/jira/browse/SPARK-4770

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236131#comment-14236131 ] Sandy Ryza commented on SPARK-3655: --- foldLeft only conceptually makes sense when applied

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2014-12-05 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14236278#comment-14236278 ] Sandy Ryza commented on SPARK-3655: --- Thanks Koert, will take a look soon. Can we

[jira] [Created] (SPARK-4716) Avoid shuffle when all-to-all operation has single input and output partition

2014-12-03 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4716: - Summary: Avoid shuffle when all-to-all operation has single input and output partition Key: SPARK-4716 URL: https://issues.apache.org/jira/browse/SPARK-4716 Project: Spark

[jira] [Commented] (SPARK-4687) SparkContext#addFile doesn't keep file folder information

2014-12-03 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14233385#comment-14233385 ] Sandy Ryza commented on SPARK-4687: --- [~pwendell], do you think this is a reasonable API

[jira] [Commented] (SPARK-4630) Dynamically determine optimal number of partitions

2014-11-30 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229371#comment-14229371 ] Sandy Ryza commented on SPARK-4630: --- Hey [~pwendell], Spark deals much better with large

[jira] [Resolved] (SPARK-4376) Put external modules behind build profiles

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza resolved SPARK-4376. --- Resolution: Duplicate Put external modules behind build profiles

[jira] [Commented] (SPARK-4628) Put all external projects behind a build flag

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226722#comment-14226722 ] Sandy Ryza commented on SPARK-4628: --- This looks like a duplicate of SPARK-4376.

[jira] [Updated] (SPARK-4630) Dynamically determine optimal number of partitions

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4630: -- Assignee: Kostas Sakellis Dynamically determine optimal number of partitions

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-26 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226969#comment-14226969 ] Sandy Ryza commented on SPARK-4452: --- Thinking about the current change a little more, an

[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224185#comment-14224185 ] Sandy Ryza commented on SPARK-4584: --- I took a look at the jobs Nishkam ran before and

[jira] [Updated] (SPARK-4585) Spark dynamic scaling executors use upper limit value as default.

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4585: -- Issue Type: Improvement (was: Bug) Spark dynamic scaling executors use upper limit value as default.

[jira] [Commented] (SPARK-4585) Spark dynamic scaling executors use upper limit value as default.

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224257#comment-14224257 ] Sandy Ryza commented on SPARK-4585: --- I was discussing this with [~brocknoland]. The

[jira] [Assigned] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza reassigned SPARK-4584: - Assignee: Sandy Ryza 2x Performance regression for Spark-on-YARN

[jira] [Updated] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4584: -- Assignee: Marcelo Vanzin (was: Sandy Ryza) 2x Performance regression for Spark-on-YARN

<    1   2   3   4   5   >