[jira] [Created] (SPARK-20863) Add metrics/instrumentation to LiveListenerBus

2017-05-23 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20863: -- Summary: Add metrics/instrumentation to LiveListenerBus Key: SPARK-20863 URL: https://issues.apache.org/jira/browse/SPARK-20863 Project: Spark Issue Type: New

[jira] [Comment Edited] (SPARK-20178) Improve Scheduler fetch failures

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020166#comment-16020166 ] Josh Rosen edited comment on SPARK-20178 at 5/22/17 8:54 PM: - Sure, let me

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020166#comment-16020166 ] Josh Rosen commented on SPARK-20178: Sure, let me clarify: When a FetchFailure occurs, the

[jira] [Issue Comment Deleted] (SPARK-20178) Improve Scheduler fetch failures

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20178: --- Comment: was deleted (was: Sure, let me clarify: * When a FetchFailure occurs, the DAGScheduler

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020141#comment-16020141 ] Josh Rosen commented on SPARK-20178: Sure, let me clarify: * When a FetchFailure occurs, the

[jira] [Commented] (SPARK-20840) Misleading spurious errors when there are Javadoc (Unidoc) breaks

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16020026#comment-16020026 ] Josh Rosen commented on SPARK-20840: [~hyukjin.kwon], I'm not sure. One thing that I would consider

[jira] [Updated] (SPARK-20845) Support specification of column names in INSERT INTO

2017-05-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20845: --- Priority: Minor (was: Major) > Support specification of column names in INSERT INTO >

[jira] [Created] (SPARK-20845) Support specification of column names in INSERT INTO

2017-05-22 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20845: -- Summary: Support specification of column names in INSERT INTO Key: SPARK-20845 URL: https://issues.apache.org/jira/browse/SPARK-20845 Project: Spark Issue Type:

[jira] [Created] (SPARK-20841) Support table column aliases in FROM clause

2017-05-22 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20841: -- Summary: Support table column aliases in FROM clause Key: SPARK-20841 URL: https://issues.apache.org/jira/browse/SPARK-20841 Project: Spark Issue Type:

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16019123#comment-16019123 ] Josh Rosen commented on SPARK-20178: Looking over a few of the tickets linked to this fetch failure

[jira] [Updated] (SPARK-20832) Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs

2017-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20832: --- Description: In SPARK-17370 (a patch authored by [~ekhliang] and reviewed by me), we added logic to

[jira] [Updated] (SPARK-20832) Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs

2017-05-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20832: --- Component/s: Deploy > Standalone master should explicitly inform drivers of worker deaths and >

[jira] [Created] (SPARK-20832) Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs

2017-05-21 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20832: -- Summary: Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs Key: SPARK-20832 URL:

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-18 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016663#comment-16016663 ] Josh Rosen commented on SPARK-18838: [~bOOmX], do you have CPU-time profiling within each of those

[jira] [Resolved] (SPARK-14584) Improve recognition of non-nullability in Dataset transformations

2017-05-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-14584. Resolution: Fixed Fix Version/s: 2.2.0 > Improve recognition of non-nullability in Dataset

[jira] [Commented] (SPARK-14584) Improve recognition of non-nullability in Dataset transformations

2017-05-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16014728#comment-16014728 ] Josh Rosen commented on SPARK-14584: [~maropu], yep, I think we can close it. I'll mark as fixed by

[jira] [Resolved] (SPARK-19555) Improve inefficient StringUtils.escapeLikeRegex() method

2017-05-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-19555. Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Fixed by SPARK-17647 in

[jira] [Commented] (SPARK-18838) High latency of event processing for large jobs

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16013317#comment-16013317 ] Josh Rosen commented on SPARK-18838: I think that SPARK-20776 /

[jira] [Updated] (SPARK-20776) Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20776: --- Attachment: (was: screenshot-1.png) > Fix JobProgressListener perf. problems caused by empty

[jira] [Updated] (SPARK-20776) Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20776: --- Description: In {code} ./bin/spark-shell --master=local[64] {code} I ran {code}

[jira] [Updated] (SPARK-20776) Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20776: --- Description: In {code} ./bin/spark-shell --master=local[64] {code} I ran {code}

[jira] [Updated] (SPARK-20776) Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20776: --- Summary: Fix JobProgressListener perf. problems caused by empty TaskMetrics initialization (was:

[jira] [Created] (SPARK-20776) Fix performance problems in TaskMetrics.nameToAccums map initialization

2017-05-16 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20776: -- Summary: Fix performance problems in TaskMetrics.nameToAccums map initialization Key: SPARK-20776 URL: https://issues.apache.org/jira/browse/SPARK-20776 Project: Spark

[jira] [Updated] (SPARK-20776) Fix performance problems in TaskMetrics.nameToAccums map initialization

2017-05-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20776: --- Attachment: screenshot-1.png > Fix performance problems in TaskMetrics.nameToAccums map

[jira] [Created] (SPARK-20715) MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker

2017-05-11 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20715: -- Summary: MapStatuses shouldn't be redundantly stored in both ShuffleMapStage and MapOutputTracker Key: SPARK-20715 URL: https://issues.apache.org/jira/browse/SPARK-20715

[jira] [Updated] (SPARK-20700) InferFiltersFromConstraints stackoverflows for query (v2)

2017-05-10 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20700: --- Description: The following (complicated) query eventually fails with a stack overflow during

[jira] [Updated] (SPARK-20700) InferFiltersFromConstraints stackoverflows for query (v2)

2017-05-10 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-20700: --- Summary: InferFiltersFromConstraints stackoverflows for query (v2) (was: Expression

[jira] [Created] (SPARK-20700) Expression canonicalization hits stack overflow for query

2017-05-10 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20700: -- Summary: Expression canonicalization hits stack overflow for query Key: SPARK-20700 URL: https://issues.apache.org/jira/browse/SPARK-20700 Project: Spark Issue

[jira] [Created] (SPARK-20686) PropagateEmptyRelation incorrectly handles aggregate without grouping expressions

2017-05-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20686: -- Summary: PropagateEmptyRelation incorrectly handles aggregate without grouping expressions Key: SPARK-20686 URL: https://issues.apache.org/jira/browse/SPARK-20686

[jira] [Assigned] (SPARK-20685) BatchPythonEvaluation UDF evaluator fails for case of single UDF with repeated argument

2017-05-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-20685: -- Assignee: Josh Rosen > BatchPythonEvaluation UDF evaluator fails for case of single UDF with

[jira] [Created] (SPARK-20685) BatchPythonEvaluation UDF evaluator fails for case of single UDF with repeated argument

2017-05-09 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20685: -- Summary: BatchPythonEvaluation UDF evaluator fails for case of single UDF with repeated argument Key: SPARK-20685 URL: https://issues.apache.org/jira/browse/SPARK-20685

[jira] [Commented] (SPARK-14584) Improve recognition of non-nullability in Dataset transformations

2017-05-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16003082#comment-16003082 ] Josh Rosen commented on SPARK-14584: [~hyukjin.kwon], yeah, this does appear to be fixed. My only

[jira] [Created] (SPARK-20573) --packages fails when transitive dependency can only be resolved from repository specified in POM's tag

2017-05-02 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20573: -- Summary: --packages fails when transitive dependency can only be resolved from repository specified in POM's tag Key: SPARK-20573 URL:

[jira] [Commented] (SPARK-4836) Web UI should display separate information for all stage attempts

2017-05-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994213#comment-15994213 ] Josh Rosen commented on SPARK-4836: --- [~ckadner], I'm pretty sure that this is still a problem.

[jira] [Commented] (SPARK-10878) Race condition when resolving Maven coordinates via Ivy

2017-05-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993888#comment-15993888 ] Josh Rosen commented on SPARK-10878: [~jeeyoungk], my understanding is that there are two possible

[jira] [Created] (SPARK-20453) Bump master branch version to 2.3.0-SNAPSHOT

2017-04-24 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20453: -- Summary: Bump master branch version to 2.3.0-SNAPSHOT Key: SPARK-20453 URL: https://issues.apache.org/jira/browse/SPARK-20453 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18406) Race between end-of-task and completion iterator read lock release

2017-04-15 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970235#comment-15970235 ] Josh Rosen commented on SPARK-18406: I can see how allowing user-level code to call setTaskContext()

[jira] [Created] (SPARK-20329) Resolution error when HAVING clause uses GROUP BY expression that involves implicit type coercion

2017-04-13 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20329: -- Summary: Resolution error when HAVING clause uses GROUP BY expression that involves implicit type coercion Key: SPARK-20329 URL: https://issues.apache.org/jira/browse/SPARK-20329

[jira] [Commented] (SPARK-18692) Test Java 8 unidoc build on Jenkins master builder

2017-03-29 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947627#comment-15947627 ] Josh Rosen commented on SPARK-18692: We can't get the full Jekyll doc build running until we have

[jira] [Resolved] (SPARK-20102) Fix two minor build script issues blocking 2.1.1 RC + master snapshot builds

2017-03-27 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-20102. Resolution: Fixed Fix Version/s: 2.2.0 2.1.1 Fixed for 2.1.1 and master.

[jira] [Created] (SPARK-20102) Fix two minor build script issues blocking 2.1.1 RC + master snapshot builds

2017-03-26 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-20102: -- Summary: Fix two minor build script issues blocking 2.1.1 RC + master snapshot builds Key: SPARK-20102 URL: https://issues.apache.org/jira/browse/SPARK-20102 Project:

[jira] [Commented] (SPARK-19496) to_date with format has weird behavior

2017-03-23 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939508#comment-15939508 ] Josh Rosen commented on SPARK-19496: Let's make sure to document this clearly in the release notes. I

[jira] [Updated] (SPARK-19496) to_date with format has weird behavior

2017-03-23 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19496: --- Labels: release-notes (was: ) > to_date with format has weird behavior >

[jira] [Created] (SPARK-19758) Casting string to timestamp in inline table definition fails with AnalysisException

2017-02-27 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19758: -- Summary: Casting string to timestamp in inline table definition fails with AnalysisException Key: SPARK-19758 URL: https://issues.apache.org/jira/browse/SPARK-19758

[jira] [Comment Edited] (SPARK-12945) ERROR LiveListenerBus: Listener JobProgressListener threw an exception

2017-02-27 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886397#comment-15886397 ] Josh Rosen edited comment on SPARK-12945 at 2/27/17 7:46 PM: - I'm still

[jira] [Updated] (SPARK-12945) ERROR LiveListenerBus: Listener JobProgressListener threw an exception

2017-02-27 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-12945: --- Affects Version/s: 2.0.2 > ERROR LiveListenerBus: Listener JobProgressListener threw an exception >

[jira] [Commented] (SPARK-12945) ERROR LiveListenerBus: Listener JobProgressListener threw an exception

2017-02-27 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15886397#comment-15886397 ] Josh Rosen commented on SPARK-12945: I'm still seeing this error intermittently on Spark 2.0.3

[jira] [Created] (SPARK-19691) Calculating percentile of decimal column fails with ClassCastException

2017-02-21 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19691: -- Summary: Calculating percentile of decimal column fails with ClassCastException Key: SPARK-19691 URL: https://issues.apache.org/jira/browse/SPARK-19691 Project: Spark

[jira] [Commented] (SPARK-19685) PipedRDD tasks should not hang on interruption / errors

2017-02-21 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876828#comment-15876828 ] Josh Rosen commented on SPARK-19685: By the way, one simple fix here might be to use the "soft-kill

[jira] [Created] (SPARK-19685) PipedRDD tasks should not hang on interruption / errors

2017-02-21 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19685: -- Summary: PipedRDD tasks should not hang on interruption / errors Key: SPARK-19685 URL: https://issues.apache.org/jira/browse/SPARK-19685 Project: Spark Issue

[jira] [Commented] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870816#comment-15870816 ] Josh Rosen commented on SPARK-14658: Here's the logs from my reproduction, excerpted down to only the

[jira] [Updated] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14658: --- Description: {code} 16/04/14 15:35:22 ERROR DAGSchedulerEventProcessLoop:

[jira] [Commented] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870777#comment-15870777 ] Josh Rosen commented on SPARK-14658: [~srowen], I think that [~yixiaohua] is right here: it looks

[jira] [Updated] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14658: --- Affects Version/s: 2.2.0 2.0.0 2.1.0 > when executor

[jira] [Updated] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14658: --- Component/s: (was: Spark Core) Scheduler > when executor lost DagScheduer may

[jira] [Reopened] (SPARK-14658) when executor lost DagScheduer may submit one stage twice even if the first running taskset for this stage is not finished

2017-02-16 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-14658: > when executor lost DagScheduer may submit one stage twice even if the first > running taskset for

[jira] [Resolved] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-19529. Resolution: Fixed > TransportClientFactory.createClient() shouldn't call awaitUninterruptibly() >

[jira] [Updated] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19529: --- Fix Version/s: 1.6.4 > TransportClientFactory.createClient() shouldn't call awaitUninterruptibly() >

[jira] [Updated] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19529: --- Target Version/s: 1.6.4, 2.0.3, 2.1.1, 2.2.0 (was: 1.6.3, 2.0.3, 2.1.1, 2.2.0) >

[jira] [Updated] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-14 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19529: --- Fix Version/s: 2.2.0 2.1.1 2.0.3 >

[jira] [Commented] (SPARK-12661) Drop Python 2.6 support in PySpark

2017-02-13 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864834#comment-15864834 ] Josh Rosen commented on SPARK-12661: IIRC the Jenkins work was to make sure that we have the new

[jira] [Created] (SPARK-19555) Improve inefficient StringUtils.escapeLikeRegex() method

2017-02-10 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19555: -- Summary: Improve inefficient StringUtils.escapeLikeRegex() method Key: SPARK-19555 URL: https://issues.apache.org/jira/browse/SPARK-19555 Project: Spark Issue

[jira] [Created] (SPARK-19529) TransportClientFactory.createClient() shouldn't call awaitUninterruptibly()

2017-02-08 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19529: -- Summary: TransportClientFactory.createClient() shouldn't call awaitUninterruptibly() Key: SPARK-19529 URL: https://issues.apache.org/jira/browse/SPARK-19529 Project:

[jira] [Resolved] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2017-01-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-18866. Resolution: Duplicate Fix Version/s: 2.2.0 2.1.1 > Codegen fails with

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2017-01-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15813172#comment-15813172 ] Josh Rosen commented on SPARK-18866: Yep, that's it. This should be fixed by Burak's patch. >

[jira] [Resolved] (SPARK-18952) regex strings not properly escaped in codegen for aggregations

2017-01-09 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-18952. Resolution: Fixed Assignee: Burak Yavuz Fix Version/s: 2.2.0

[jira] [Updated] (SPARK-19100) Schedule tasks in descending order of estimated input size / estimated task duration

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19100: --- Description: Say that you're scheduling a reduce phase and based on the map output sizes you have

[jira] [Created] (SPARK-19100) Schedule tasks in descending order of estimated input size / estimated task duration

2017-01-05 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19100: -- Summary: Schedule tasks in descending order of estimated input size / estimated task duration Key: SPARK-19100 URL: https://issues.apache.org/jira/browse/SPARK-19100

[jira] [Updated] (SPARK-19093) Cached tables are not used in SubqueryExpression

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19093: --- Summary: Cached tables are not used in SubqueryExpression (was: LeftAntiJoin doesn't seem to

[jira] [Commented] (SPARK-19093) LeftAntiJoin doesn't seem to resolve cached tables on right side

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803709#comment-15803709 ] Josh Rosen commented on SPARK-19093: I'm a bit too busy with other work to tackle this right now, so

[jira] [Commented] (SPARK-19093) LeftAntiJoin doesn't seem to resolve cached tables on right side

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803623#comment-15803623 ] Josh Rosen commented on SPARK-19093: I'm not sure whether that's the case because I seemed to observe

[jira] [Commented] (SPARK-19091) createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq)

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803117#comment-15803117 ] Josh Rosen commented on SPARK-19091: Given above comment, maybe my original JIRA here of better stats

[jira] [Commented] (SPARK-19091) createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq)

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15803105#comment-15803105 ] Josh Rosen commented on SPARK-19091: This is a pretty easy change but it does impact things slightly

[jira] [Updated] (SPARK-19091) createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq)

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Description: It turns out that spark.createDataset(sc.parallelize(x: Seq)) and

[jira] [Issue Comment Deleted] (SPARK-19091) createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq)

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Comment: was deleted (was: Upon closer inspection, I think the right approach here might be to

[jira] [Updated] (SPARK-19091) createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq)

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Summary: createDataset(sc.parallelize(x: Seq)) should be equivalent to createDataset(x: Seq) (was:

[jira] [Updated] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Description: The Catalyst optimizer uses LogicalRDD to represent scans from existing RDDs. In

[jira] [Created] (SPARK-19093) LeftAntiJoin doesn't seem to resolve cached tables on right side

2017-01-05 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19093: -- Summary: LeftAntiJoin doesn't seem to resolve cached tables on right side Key: SPARK-19093 URL: https://issues.apache.org/jira/browse/SPARK-19093 Project: Spark

[jira] [Commented] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802938#comment-15802938 ] Josh Rosen commented on SPARK-19091: Upon closer inspection, I think the right approach here might be

[jira] [Updated] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Description: The Catalyst optimizer uses LogicalRDD to represent scans from existing RDDs. In

[jira] [Updated] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Description: The Catalyst optimizer uses LogicalRDD to represent scans from existing RDDs. In

[jira] [Updated] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Description: The Catalyst optimizer uses LogicalRDD to represent scans from existing RDDs. In

[jira] [Created] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19091: -- Summary: Implement more accurate statistics for LogicalRDD when child is ParallelCollectionRDD Key: SPARK-19091 URL: https://issues.apache.org/jira/browse/SPARK-19091

[jira] [Updated] (SPARK-19091) Implement more accurate statistics for LogicalRDD when child is a mapped ParallelCollectionRDD

2017-01-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-19091: --- Summary: Implement more accurate statistics for LogicalRDD when child is a mapped

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2017-01-03 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795814#comment-15795814 ] Josh Rosen commented on SPARK-18866: I think this is a duplicate of SPARK-18952 > Codegen fails with

[jira] [Commented] (SPARK-19044) PySpark dropna() can fail with AnalysisException

2016-12-31 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15790208#comment-15790208 ] Josh Rosen commented on SPARK-19044: In fact, this is an instance of a more general problem when

[jira] [Created] (SPARK-19044) PySpark dropna() can fail with AnalysisException

2016-12-31 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-19044: -- Summary: PySpark dropna() can fail with AnalysisException Key: SPARK-19044 URL: https://issues.apache.org/jira/browse/SPARK-19044 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-18928) FileScanRDD, JDBCRDD, and UnsafeSorter should support task cancellation

2016-12-19 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-18928: -- Summary: FileScanRDD, JDBCRDD, and UnsafeSorter should support task cancellation Key: SPARK-18928 URL: https://issues.apache.org/jira/browse/SPARK-18928 Project: Spark

[jira] [Commented] (SPARK-17892) Query in CTAS is Optimized Twice (branch-2.0)

2016-12-12 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742575#comment-15742575 ] Josh Rosen commented on SPARK-17892: In the future, please re-use the existing JIRA when backporting

[jira] [Commented] (SPARK-17409) Query in CTAS is Optimized Twice

2016-12-12 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742577#comment-15742577 ] Josh Rosen commented on SPARK-17409: This was actually fixed in branch-2.x as well, via SPARK-17892.

[jira] [Created] (SPARK-18761) Uncancellable / unkillable tasks may starve jobs of resoures

2016-12-06 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-18761: -- Summary: Uncancellable / unkillable tasks may starve jobs of resoures Key: SPARK-18761 URL: https://issues.apache.org/jira/browse/SPARK-18761 Project: Spark

[jira] [Updated] (SPARK-14660) Executors show up active tasks indefinitely after stage is killed

2016-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14660: --- Component/s: Scheduler > Executors show up active tasks indefinitely after stage is killed >

[jira] [Updated] (SPARK-14932) Allow DataFrame.replace() to replace values with None

2016-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14932: --- Labels: starter (was: ) > Allow DataFrame.replace() to replace values with None >

[jira] [Commented] (SPARK-14932) Allow DataFrame.replace() to replace values with None

2016-12-05 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15723620#comment-15723620 ] Josh Rosen commented on SPARK-14932: I think that there's a similar issue impacting the Scala / Java

[jira] [Commented] (SPARK-18692) Test Java 8 unidoc build on Jenkins master builder

2016-12-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15716758#comment-15716758 ] Josh Rosen commented on SPARK-18692: +1; this would be great to have. I believe that this may get

[jira] [Commented] (SPARK-18640) Fix minor synchronization issue in TaskSchedulerImpl.runningTasksByExecutors

2016-12-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715712#comment-15715712 ] Josh Rosen commented on SPARK-18640: Actually, this doesn't look necessary because that method isn't

[jira] [Commented] (SPARK-18640) Fix minor synchronization issue in TaskSchedulerImpl.runningTasksByExecutors

2016-12-02 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15715690#comment-15715690 ] Josh Rosen commented on SPARK-18640: I'm also going to backport this into branch-1.6. > Fix minor

[jira] [Resolved] (SPARK-18553) Executor loss may cause TaskSetManager to be leaked

2016-12-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-18553. Resolution: Fixed Fix Version/s: 1.6.4 > Executor loss may cause TaskSetManager to be

[jira] [Updated] (SPARK-18362) Use TextFileFormat in implementation of CSVFileFormat

2016-11-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-18362: --- Summary: Use TextFileFormat in implementation of CSVFileFormat (was: Use TextFileFormat in

[jira] [Updated] (SPARK-18362) Use TextFileFormat in implementation of CSVFileFormat

2016-11-30 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-18362: --- Description: Spark's CSVFileFormat data source uses inefficient methods for reading files during

<    1   2   3   4   5   6   7   8   9   10   >