[jira] [Assigned] (SPARK-13277) ANTLR ignores other rule using the USING keyword
[ https://issues.apache.org/jira/browse/SPARK-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13277: Assignee: (was: Apache Spark) > ANTLR ignores other rule using the USING keyword > > > Key: SPARK-13277 > URL: https://issues.apache.org/jira/browse/SPARK-13277 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Priority: Minor > > ANTLR currently emits the following warning during compilation: > {noformat} > warning(200): org/apache/spark/sql/catalyst/parser/SparkSqlParser.g:938:7: > Decision can match input such as "KW_USING Identifier" using multiple > alternatives: 2, 3 > As a result, alternative(s) 3 were disabled for that input > {noformat} > This means that some of the functionality of the parser is disabled. This is > introduced by the migration of the DDLParsers > (https://github.com/apache/spark/pull/10723). We should be able to fix this > by introducing a syntactic predicate for USING. > cc [~viirya] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13277) ANTLR ignores other rule using the USING keyword
[ https://issues.apache.org/jira/browse/SPARK-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142376#comment-15142376 ] Apache Spark commented on SPARK-13277: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/11168 > ANTLR ignores other rule using the USING keyword > > > Key: SPARK-13277 > URL: https://issues.apache.org/jira/browse/SPARK-13277 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Priority: Minor > > ANTLR currently emits the following warning during compilation: > {noformat} > warning(200): org/apache/spark/sql/catalyst/parser/SparkSqlParser.g:938:7: > Decision can match input such as "KW_USING Identifier" using multiple > alternatives: 2, 3 > As a result, alternative(s) 3 were disabled for that input > {noformat} > This means that some of the functionality of the parser is disabled. This is > introduced by the migration of the DDLParsers > (https://github.com/apache/spark/pull/10723). We should be able to fix this > by introducing a syntactic predicate for USING. > cc [~viirya] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13277) ANTLR ignores other rule using the USING keyword
[ https://issues.apache.org/jira/browse/SPARK-13277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13277: Assignee: Apache Spark > ANTLR ignores other rule using the USING keyword > > > Key: SPARK-13277 > URL: https://issues.apache.org/jira/browse/SPARK-13277 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Assignee: Apache Spark >Priority: Minor > > ANTLR currently emits the following warning during compilation: > {noformat} > warning(200): org/apache/spark/sql/catalyst/parser/SparkSqlParser.g:938:7: > Decision can match input such as "KW_USING Identifier" using multiple > alternatives: 2, 3 > As a result, alternative(s) 3 were disabled for that input > {noformat} > This means that some of the functionality of the parser is disabled. This is > introduced by the migration of the DDLParsers > (https://github.com/apache/spark/pull/10723). We should be able to fix this > by introducing a syntactic predicate for USING. > cc [~viirya] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13270) Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan
[ https://issues.apache.org/jira/browse/SPARK-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13270. - Resolution: Fixed Assignee: Nong Li Fix Version/s: 2.0.0 > Improve readability of whole stage codegen by skipping empty lines and > outputting the pipeline plan > --- > > Key: SPARK-13270 > URL: https://issues.apache.org/jira/browse/SPARK-13270 > Project: Spark > Issue Type: Bug >Reporter: Nong Li >Assignee: Nong Li > Fix For: 2.0.0 > > > It would be nice to comment the generated function with the pipeline it is > for, particularly for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13235) Remove an Extra Distinct in Union
[ https://issues.apache.org/jira/browse/SPARK-13235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-13235. --- Resolution: Fixed Assignee: Xiao Li Fix Version/s: 2.0.0 > Remove an Extra Distinct in Union > - > > Key: SPARK-13235 > URL: https://issues.apache.org/jira/browse/SPARK-13235 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > Union Distinct has two Distinct that generate two Aggregation in the plan. > {code} > sql("select * from t0 union select * from t0").explain(true) > {code} > {code} > == Parsed Logical Plan == > 'Project [unresolvedalias(*,None)] > +- 'Subquery u_2 >+- 'Distinct > +- 'Project [unresolvedalias(*,None)] > +- 'Subquery u_1 > +- 'Distinct >+- 'Union > :- 'Project [unresolvedalias(*,None)] > : +- 'UnresolvedRelation `t0`, None > +- 'Project [unresolvedalias(*,None)] > +- 'UnresolvedRelation `t0`, None > == Analyzed Logical Plan == > id: bigint > Project [id#16L] > +- Subquery u_2 >+- Distinct > +- Project [id#16L] > +- Subquery u_1 > +- Distinct >+- Union > :- Project [id#16L] > : +- Subquery t0 > : +- Relation[id#16L] ParquetRelation > +- Project [id#16L] > +- Subquery t0 > +- Relation[id#16L] ParquetRelation > == Optimized Logical Plan == > Aggregate [id#16L], [id#16L] > +- Aggregate [id#16L], [id#16L] >+- Union > :- Project [id#16L] > : +- Relation[id#16L] ParquetRelation > +- Project [id#16L] > +- Relation[id#16L] ParquetRelation > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13281) Switch broadcast of RDD to exception from warning
holdenk created SPARK-13281: --- Summary: Switch broadcast of RDD to exception from warning Key: SPARK-13281 URL: https://issues.apache.org/jira/browse/SPARK-13281 Project: Spark Issue Type: Improvement Reporter: holdenk Priority: Trivial In the comments we log a warning when a user tries to broadcast an RDD for compatibility with old programs which may have broadcast RDDs without using the resulting broadcast variable. Since we're moving to 2.0 it seems like now would be a good opportunity to replace that warning with an exception rather than depend on the developer finding the warning message. Related to https://issues.apache.org/jira/browse/SPARK-5063 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13276) Parse Table Identifiers/Expression skips bad characters at the end of the passed string
[ https://issues.apache.org/jira/browse/SPARK-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-13276. --- Resolution: Fixed Assignee: Herman van Hovell Fix Version/s: 2.0.0 > Parse Table Identifiers/Expression skips bad characters at the end of the > passed string > --- > > Key: SPARK-13276 > URL: https://issues.apache.org/jira/browse/SPARK-13276 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Assignee: Herman van Hovell >Priority: Minor > Fix For: 2.0.0 > > > Both the ParseDriver.parseTableName/parseExpression methods currently allow > the passed command to end with any kind of (bad) characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13234) Remove duplicated SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13234. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11163 [https://github.com/apache/spark/pull/11163] > Remove duplicated SQL metrics > - > > Key: SPARK-13234 > URL: https://issues.apache.org/jira/browse/SPARK-13234 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > Fix For: 2.0.0 > > > For lots of SQL operators, we have metrics for both of input and output, the > number of input rows should be exactly the number of output rows of child, we > could only have metrics for output rows. > After we improve the performance using whole stage codegen, the overhead of > SQL metrics are not trivial anymore, we should avoid that if it's not > necessary. > Some of the operator does not have SQL metrics, we should add that for them. > For those operators that have the same number of rows from input and output > (for example, Projection, we may don't need that). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces
[ https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142318#comment-15142318 ] Avihoo Mamka commented on SPARK-13061: -- I'm executing this request using Python requests {code:xml} http://spark.mysite.com:8088/ws/v1/cluster/apps/?state=RUNNING {code} Then I convert my result to json and get this result: {code:xml} {u'apps': {u'app': [{u'runningContainers': 61, u'allocatedVCores': 61, u'clusterId': 1448371222831, u'amContainerLogs': u'http://ip-10-20-1-246:8042/node/containerlogs/container_1448371222831_2557_01_01/hadoop', u'id': u'application_1448371222831_2557', u'preemptedResourceMB': 0, u'finishedTime': 0, u'numAMContainerPreempted': 0, u'user': u'hadoop', u'preemptedResourceVCores': 0, u'startedTime': 1455170737855, u'elapsedTime': 1769164, u'state': u'RUNNING', u'numNonAMContainerPreempted': 0, u'progress': 10.0, u'trackingUI': u'ApplicationMaster', u'trackingUrl': u'http://ip-10-20-1-104:20888/proxy/application_1448371222831_2557/', u'allocatedMB': 553984, u'amHostHttpAddress': u'ip-10-20-1-246:8042', u'memorySeconds': 1520893, u'applicationTags': u'', u'name': u'Spark shell', u'queue': u'default', u'vcoreSeconds': 134, u'applicationType': u'SPARK', u'diagnostics': u'', u'finalStatus': u'UNDEFINED'}]}} {code} I then extract the value of key {code:xml}apps{code} and extract the value of key {code:xml}app{code} inside. So right now I have this json array: {code:xml} [{u'runningContainers': 61, u'allocatedVCores': 61, u'clusterId': 1448371222831, u'amContainerLogs': u'http://ip-10-20-1-246:8042/node/containerlogs/container_1448371222831_2557_01_01/hadoop', u'id': u'application_1448371222831_2557', u'preemptedResourceMB': 0, u'finishedTime': 0, u'numAMContainerPreempted': 0, u'user': u'hadoop', u'preemptedResourceVCores': 0, u'startedTime': 1455170737855, u'elapsedTime': 1769164, u'state': u'RUNNING', u'numNonAMContainerPreempted': 0, u'progress': 10.0, u'trackingUI': u'ApplicationMaster', u'trackingUrl': u'http://ip-10-20-1-104:20888/proxy/application_1448371222831_2557/', u'allocatedMB': 553984, u'amHostHttpAddress': u'ip-10-20-1-246:8042', u'memorySeconds': 1520893, u'applicationTags': u'', u'name': u'Spark shell', u'queue': u'default', u'vcoreSeconds': 134, u'applicationType': u'SPARK', u'diagnostics': u'', u'finalStatus': u'UNDEFINED'}] {code} I then run in for loop and for each item in the array, I extract the {code:xml}id{code} and {code:xml}name{code} In the above example it will be: id -> application_1448371222831_2557 and name -> Spark shell Now I execute this rest call: {code:xml} http://spark.mysite.com:20888/proxy/application_1448371222831_2557/api/v1/applications/Spark shell/jobs/0 {code} And then I get this result: {code:xml} Spark shell Not Found {code} > Error in spark rest api application info for job names contains spaces > -- > > Key: SPARK-13061 > URL: https://issues.apache.org/jira/browse/SPARK-13061 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Avihoo Mamka >Priority: Trivial > Labels: rest_api, spark > > When accessing spark rest api with application id to get job specific id > status, a job with name containing whitespaces are being encoded to '%20' and > therefore the rest api returns `no such app`. > For example: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ > returns: > [ { > "id" : "Spark shell", > "name" : "Spark shell", > "attempts" : [ { > "startTime" : "2016-01-28T09:20:58.526GMT", > "endTime" : "1969-12-31T23:59:59.999GMT", > "sparkUser" : "", > "completed" : false > } ] > } ] > and then when accessing: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark > shell/ > the result returned is: > unknown app: Spark%20shell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13260) count(*) does not work with CSV data source
[ https://issues.apache.org/jira/browse/SPARK-13260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142295#comment-15142295 ] Thomas Sebastian commented on SPARK-13260: -- HI [~falaki] Could you give more details about this issue? Do you see any count difference? > count(*) does not work with CSV data source > --- > > Key: SPARK-13260 > URL: https://issues.apache.org/jira/browse/SPARK-13260 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Hossein Falaki > > column pruning CSV data source seems to omit all columns when we run > following query: > {code} > select count(*) from csvTable > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13279: Assignee: (was: Apache Spark) > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - In addPendingTask function, we don't really need a duplicate > check. It's okay if we add a task to the same queue twice because > dequeueTaskFromList will skip already-running tasks. > Please note that this is a regression from Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13279: Assignee: Apache Spark > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia >Assignee: Apache Spark > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - In addPendingTask function, we don't really need a duplicate > check. It's okay if we add a task to the same queue twice because > dequeueTaskFromList will skip already-running tasks. > Please note that this is a regression from Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142286#comment-15142286 ] Apache Spark commented on SPARK-13279: -- User 'sitalkedia' has created a pull request for this issue: https://github.com/apache/spark/pull/11167 > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - In addPendingTask function, we don't really need a duplicate > check. It's okay if we add a task to the same queue twice because > dequeueTaskFromList will skip already-running tasks. > Please note that this is a regression from Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sital Kedia updated SPARK-13279: Description: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - In addPendingTask function, we don't really need a duplicate check. It's okay if we add a task to the same queue twice because dequeueTaskFromList will skip already-running tasks. Please note that this is a regression from Spark 1.5. was: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - In addPendingTask function, we don't really need a duplicate check. It's okay if we add a task to the same queue twice because dequeueTaskFromList will skip already-running tasks. > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - In addPendingTask function, we don't really need a duplicate > check. It's okay if we add a task to the same queue twice because > dequeueTaskFromList will skip already-running tasks. > Please note that this is a regression from Spark 1.5. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sital Kedia updated SPARK-13279: Description: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - In addPendingTask function, we don't really need a duplicate check. It's okay if we add a task to the same queue twice because dequeueTaskFromList will skip already-running tasks. was: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - Instead of an ArrayBuffer, we can use a LinkedHashSet which will provide us o(1) lookup and also maintain the ordering. > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - In addPendingTask function, we don't really need a duplicate > check. It's okay if we add a task to the same queue twice because > dequeueTaskFromList will skip already-running tasks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00
[ https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142267#comment-15142267 ] Jayadevan M commented on SPARK-13268: - @Ilya Ganelin Can you tell how you import ZonedDateTime in your program and java version ? > SQL Timestamp stored as GMT but toString returns GMT-08:00 > -- > > Key: SPARK-13268 > URL: https://issues.apache.org/jira/browse/SPARK-13268 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Ilya Ganelin > > There is an issue with how timestamps are displayed/converted to Strings in > Spark SQL. The documentation states that the timestamp should be created in > the GMT time zone, however, if we do so, we see that the output actually > contains a -8 hour offset: > {code} > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) > res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) > res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 > {code} > This result is confusing, unintuitive, and introduces issues when converting > from DataFrames containing timestamps to RDDs which are then saved as text. > This has the effect of essentially shifting all dates in a dataset by 1 day. > The suggested fix for this is to update the timestamp toString representation > to either a) Include timezone or b) Correctly display in GMT. > This change may well introduce substantial and insidious bugs so I'm not sure > how best to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12706) support grouping/grouping_id function together group set
[ https://issues.apache.org/jira/browse/SPARK-12706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-12706. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10677 [https://github.com/apache/spark/pull/10677] > support grouping/grouping_id function together group set > > > Key: SPARK-12706 > URL: https://issues.apache.org/jira/browse/SPARK-12706 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup#EnhancedAggregation,Cube,GroupingandRollup-Grouping__IDfunction > http://etutorials.org/SQL/Mastering+Oracle+SQL/Chapter+13.+Advanced+Group+Operations/13.3+The+GROUPING_ID+and+GROUP_ID+Functions/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13205) SQL generation support for self join
[ https://issues.apache.org/jira/browse/SPARK-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated SPARK-13205: --- Assignee: Xiao Li > SQL generation support for self join > > > Key: SPARK-13205 > URL: https://issues.apache.org/jira/browse/SPARK-13205 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > SQL generation does not support the Self Join. > {code}SELECT x.key FROM t1 x JOIN t1 y ON x.key = y.key{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13205) SQL generation support for self join
[ https://issues.apache.org/jira/browse/SPARK-13205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-13205. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11084 [https://github.com/apache/spark/pull/11084] > SQL generation support for self join > > > Key: SPARK-13205 > URL: https://issues.apache.org/jira/browse/SPARK-13205 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.0.0 > > > SQL generation does not support the Self Join. > {code}SELECT x.key FROM t1 x JOIN t1 y ON x.key = y.key{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13280) FileBasedWriteAheadLog logger name should be under o.a.s namespace
[ https://issues.apache.org/jira/browse/SPARK-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13280: Assignee: (was: Apache Spark) > FileBasedWriteAheadLog logger name should be under o.a.s namespace > -- > > Key: SPARK-13280 > URL: https://issues.apache.org/jira/browse/SPARK-13280 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Priority: Minor > > The logger name in FileBasedWriteAheadLog is currently defined as: > {code} > override protected val logName = s"WriteAheadLogManager $callerNameTag" > {code} > That has two problems: > - It's not under the usual "org.apache.spark" namespace so changing the > logging configuration for that package does not affect it > - we've seen cases where {{$callerNameTag}} was empty, in which case the > logger name would have a trailing space, making it impossible to disable it > using a properties file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13280) FileBasedWriteAheadLog logger name should be under o.a.s namespace
[ https://issues.apache.org/jira/browse/SPARK-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142179#comment-15142179 ] Apache Spark commented on SPARK-13280: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/11165 > FileBasedWriteAheadLog logger name should be under o.a.s namespace > -- > > Key: SPARK-13280 > URL: https://issues.apache.org/jira/browse/SPARK-13280 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Priority: Minor > > The logger name in FileBasedWriteAheadLog is currently defined as: > {code} > override protected val logName = s"WriteAheadLogManager $callerNameTag" > {code} > That has two problems: > - It's not under the usual "org.apache.spark" namespace so changing the > logging configuration for that package does not affect it > - we've seen cases where {{$callerNameTag}} was empty, in which case the > logger name would have a trailing space, making it impossible to disable it > using a properties file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13280) FileBasedWriteAheadLog logger name should be under o.a.s namespace
[ https://issues.apache.org/jira/browse/SPARK-13280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13280: Assignee: Apache Spark > FileBasedWriteAheadLog logger name should be under o.a.s namespace > -- > > Key: SPARK-13280 > URL: https://issues.apache.org/jira/browse/SPARK-13280 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 2.0.0 >Reporter: Marcelo Vanzin >Assignee: Apache Spark >Priority: Minor > > The logger name in FileBasedWriteAheadLog is currently defined as: > {code} > override protected val logName = s"WriteAheadLogManager $callerNameTag" > {code} > That has two problems: > - It's not under the usual "org.apache.spark" namespace so changing the > logging configuration for that package does not affect it > - we've seen cases where {{$callerNameTag}} was empty, in which case the > logger name would have a trailing space, making it impossible to disable it > using a properties file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13280) FileBasedWriteAheadLog logger name should be under o.a.s namespace
Marcelo Vanzin created SPARK-13280: -- Summary: FileBasedWriteAheadLog logger name should be under o.a.s namespace Key: SPARK-13280 URL: https://issues.apache.org/jira/browse/SPARK-13280 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 2.0.0 Reporter: Marcelo Vanzin Priority: Minor The logger name in FileBasedWriteAheadLog is currently defined as: {code} override protected val logName = s"WriteAheadLogManager $callerNameTag" {code} That has two problems: - It's not under the usual "org.apache.spark" namespace so changing the logging configuration for that package does not affect it - we've seen cases where {{$callerNameTag}} was empty, in which case the logger name would have a trailing space, making it impossible to disable it using a properties file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved SPARK-12725. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11050 [https://github.com/apache/spark/pull/11050] > SQL generation suffers from name conficts introduced by some analysis rules > --- > > Key: SPARK-12725 > URL: https://issues.apache.org/jira/browse/SPARK-12725 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Cheng Lian >Assignee: Xiao Li > Fix For: 2.0.0 > > > Some analysis rules generate auxiliary attribute references with the same > name but different expression IDs. For example, {{ResolveAggregateFunctions}} > introduces {{havingCondition}} and {{aggOrder}}, and > {{DistinctAggregationRewriter}} introduces {{gid}}. > This is OK for normal query execution since these attribute references get > expression IDs. However, it's troublesome when converting resolved query > plans back to SQL query strings since expression IDs are erased. > Here's an example Spark 1.6.0 snippet for illustration: > {code} > sqlContext.range(10).select('id as 'a, 'id as 'b).registerTempTable("t") > sqlContext.sql("SELECT SUM(a) FROM t GROUP BY a, b ORDER BY COUNT(a), > COUNT(b)").explain(true) > {code} > The above code produces the following resolved plan: > {noformat} > == Analyzed Logical Plan == > _c0: bigint > Project [_c0#101L] > +- Sort [aggOrder#102L ASC,aggOrder#103L ASC], true >+- Aggregate [a#47L,b#48L], [(sum(a#47L),mode=Complete,isDistinct=false) > AS _c0#101L,(count(a#47L),mode=Complete,isDistinct=false) AS > aggOrder#102L,(count(b#48L),mode=Complete,isDistinct=false) AS aggOrder#103L] > +- Subquery t > +- Project [id#46L AS a#47L,id#46L AS b#48L] > +- LogicalRDD [id#46L], MapPartitionsRDD[44] at range at > :26 > {noformat} > Here we can see that both aggregate expressions in {{ORDER BY}} are extracted > into an {{Aggregate}} operator, and both of them are named {{aggOrder}} with > different expression IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12675) Executor dies because of ClassCastException and causes timeout
[ https://issues.apache.org/jira/browse/SPARK-12675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142120#comment-15142120 ] Sven Krasser commented on SPARK-12675: -- I'm running into the same issue (same exception) running locally using Spark 1.6.0. There are just 200 partitions in my case. My job fails in local mode, but it eventually completes in local\[*\] mode (but in either case the exception occurs during processing). [~josephkb], any suggestions on where to go from here -- reopen? The {{ClassCastException}} certainly looks suspicious. > Executor dies because of ClassCastException and causes timeout > -- > > Key: SPARK-12675 > URL: https://issues.apache.org/jira/browse/SPARK-12675 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 2.0.0 > Environment: 64-bit Linux Ubuntu 15.10, 16GB RAM, 8 cores 3ghz >Reporter: Alexandru Rosianu >Priority: Minor > > I'm trying to fit a Spark ML pipeline but my executor dies. Here's the script > which doesn't work (a bit simplified): > {code:title=Script.scala} > // Prepare data sets > logInfo("Getting datasets") > val emoTrainingData = > sqlc.read.parquet("/tw/sentiment/emo/parsed/data.parquet") > val trainingData = emoTrainingData > // Configure the pipeline > val pipeline = new Pipeline().setStages(Array( > new > FeatureReducer().setInputCol("raw_text").setOutputCol("reduced_text"), > new StringSanitizer().setInputCol("reduced_text").setOutputCol("text"), > new Tokenizer().setInputCol("text").setOutputCol("raw_words"), > new StopWordsRemover().setInputCol("raw_words").setOutputCol("words"), > new HashingTF().setInputCol("words").setOutputCol("features"), > new NaiveBayes().setSmoothing(0.5).setFeaturesCol("features"), > new ColumnDropper().setDropColumns("raw_text", "reduced_text", "text", > "raw_words", "words", "features") > )) > // Fit the pipeline > logInfo(s"Training model on ${trainingData.count()} rows") > val model = pipeline.fit(trainingData) > {code} > It executes up to the last line. It prints "Training model on xx rows", then > it starts fitting, the executor dies, the drivers doesn't receive heartbeats > from the executor and it times out, then the script exits. It doesn't get > past that line. > This is the exception that kills the executor: > {code} > java.io.IOException: java.lang.ClassCastException: cannot assign instance > of scala.collection.immutable.HashMap$SerializationProxy to field > org.apache.spark.executor.TaskMetrics._accumulatorUpdates of type > scala.collection.immutable.Map in instance of > org.apache.spark.executor.TaskMetrics > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1207) > at > org.apache.spark.executor.TaskMetrics.readObject(TaskMetrics.scala:219) > at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) > at org.apache.spark.util.Utils$.deserialize(Utils.scala:92) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1$$anonfun$apply$6.apply(Executor.scala:436) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1$$anonfun$apply$6.apply(Executor.scala:426) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1.apply(Executor.scala:426) > at > org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$reportHeartBeat$1.apply(Executor.scala:424) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:424) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:468) > at > org.apache.spark.executor.Ex
[jira] [Updated] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13274: Fix Version/s: 1.6.1 > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Assignee: Raela Wang >Priority: Trivial > Fix For: 1.6.1, 2.0.0 > > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13274. - Resolution: Fixed Assignee: Raela Wang Fix Version/s: 2.0.0 > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Assignee: Raela Wang >Priority: Trivial > Fix For: 2.0.0 > > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
[ https://issues.apache.org/jira/browse/SPARK-13279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sital Kedia updated SPARK-13279: Description: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - Instead of an ArrayBuffer, we can use a LinkedHashSet which will provide us o(1) lookup and also maintain the ordering. was: While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop" which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - Instead of an ArrayBuffer, we can use a LinkedHashSet which will provide us o(1) lookup and also maintain the ordering. > Spark driver stuck holding a global lock when there are 200k tasks submitted > in a stage > --- > > Key: SPARK-13279 > URL: https://issues.apache.org/jira/browse/SPARK-13279 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Sital Kedia > Fix For: 1.6.0 > > > While running a large pipeline with 200k tasks, we found that the executors > were not able to register with the driver because the driver was stuck > holding a global lock in TaskSchedulerImpl.submitTasks function. > jstack of the driver - http://pastebin.com/m8CP6VMv > executor log - http://pastebin.com/2NPS1mXC > From the jstack I see that the thread handing the resource offer from > executors (dispatcher-event-loop-9) is blocked on a lock held by the thread > "dag-scheduler-event-loop", which is iterating over an entire ArrayBuffer > when adding a pending tasks. So when we have 200k pending tasks, because of > this o(n2) operations, the driver is just hung for more than 5 minutes. > Solution - Instead of an ArrayBuffer, we can use a LinkedHashSet which will > provide us o(1) lookup and also maintain the ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13279) Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage
Sital Kedia created SPARK-13279: --- Summary: Spark driver stuck holding a global lock when there are 200k tasks submitted in a stage Key: SPARK-13279 URL: https://issues.apache.org/jira/browse/SPARK-13279 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Sital Kedia Fix For: 1.6.0 While running a large pipeline with 200k tasks, we found that the executors were not able to register with the driver because the driver was stuck holding a global lock in TaskSchedulerImpl.submitTasks function. jstack of the driver - http://pastebin.com/m8CP6VMv executor log - http://pastebin.com/2NPS1mXC >From the jstack I see that the thread handing the resource offer from >executors (dispatcher-event-loop-9) is blocked on a lock held by the thread >"dag-scheduler-event-loop" which is iterating over an entire ArrayBuffer when >adding a pending tasks. So when we have 200k pending tasks, because of this >o(n2) operations, the driver is just hung for more than 5 minutes. Solution - Instead of an ArrayBuffer, we can use a LinkedHashSet which will provide us o(1) lookup and also maintain the ordering. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13069) ActorHelper is not throttled by rate limiter
[ https://issues.apache.org/jira/browse/SPARK-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141998#comment-15141998 ] Lin Zhao commented on SPARK-13069: -- There also seems to be no way to specify bounded blocking mailbox for ActorReceiverSupervisor, which would solve this. Only way I can think of is adding a storeSync to ActorReceiver. > ActorHelper is not throttled by rate limiter > > > Key: SPARK-13069 > URL: https://issues.apache.org/jira/browse/SPARK-13069 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Lin Zhao > > The rate an actor receiver sends data to spark is not limited by maxRate or > back pressure. Spark would control how fast it writes the data to block > manager, but the receiver actor sends events asynchronously and would fill > out akka mailbox with millions of events until memory runs out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13069) ActorHelper is not throttled by rate limiter
[ https://issues.apache.org/jira/browse/SPARK-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141992#comment-15141992 ] Lin Zhao commented on SPARK-13069: -- I haven't tested with the master code but looking at the source it almost certain has the same issue. > ActorHelper is not throttled by rate limiter > > > Key: SPARK-13069 > URL: https://issues.apache.org/jira/browse/SPARK-13069 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Lin Zhao > > The rate an actor receiver sends data to spark is not limited by maxRate or > back pressure. Spark would control how fast it writes the data to block > manager, but the receiver actor sends events asynchronously and would fill > out akka mailbox with millions of events until memory runs out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13234) Remove duplicated SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141930#comment-15141930 ] Apache Spark commented on SPARK-13234: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11163 > Remove duplicated SQL metrics > - > > Key: SPARK-13234 > URL: https://issues.apache.org/jira/browse/SPARK-13234 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > > For lots of SQL operators, we have metrics for both of input and output, the > number of input rows should be exactly the number of output rows of child, we > could only have metrics for output rows. > After we improve the performance using whole stage codegen, the overhead of > SQL metrics are not trivial anymore, we should avoid that if it's not > necessary. > Some of the operator does not have SQL metrics, we should add that for them. > For those operators that have the same number of rows from input and output > (for example, Projection, we may don't need that). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13234) Remove duplicated SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13234: Assignee: Apache Spark > Remove duplicated SQL metrics > - > > Key: SPARK-13234 > URL: https://issues.apache.org/jira/browse/SPARK-13234 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > > For lots of SQL operators, we have metrics for both of input and output, the > number of input rows should be exactly the number of output rows of child, we > could only have metrics for output rows. > After we improve the performance using whole stage codegen, the overhead of > SQL metrics are not trivial anymore, we should avoid that if it's not > necessary. > Some of the operator does not have SQL metrics, we should add that for them. > For those operators that have the same number of rows from input and output > (for example, Projection, we may don't need that). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13234) Remove duplicated SQL metrics
[ https://issues.apache.org/jira/browse/SPARK-13234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13234: Assignee: (was: Apache Spark) > Remove duplicated SQL metrics > - > > Key: SPARK-13234 > URL: https://issues.apache.org/jira/browse/SPARK-13234 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > > For lots of SQL operators, we have metrics for both of input and output, the > number of input rows should be exactly the number of output rows of child, we > could only have metrics for output rows. > After we improve the performance using whole stage codegen, the overhead of > SQL metrics are not trivial anymore, we should avoid that if it's not > necessary. > Some of the operator does not have SQL metrics, we should add that for them. > For those operators that have the same number of rows from input and output > (for example, Projection, we may don't need that). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13149) Add FileStreamSource
[ https://issues.apache.org/jira/browse/SPARK-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141917#comment-15141917 ] Apache Spark commented on SPARK-13149: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/11162 > Add FileStreamSource > > > Key: SPARK-13149 > URL: https://issues.apache.org/jira/browse/SPARK-13149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13262) cannot coerce type 'environment' to vector of type 'list'
[ https://issues.apache.org/jira/browse/SPARK-13262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141877#comment-15141877 ] Shivaram Venkataraman commented on SPARK-13262: --- Can you paste the code you ran that led to the error ? It'll be great if there is a small reproducible example > cannot coerce type 'environment' to vector of type 'list' > - > > Key: SPARK-13262 > URL: https://issues.apache.org/jira/browse/SPARK-13262 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 1.5.2 >Reporter: Samuel Alexander > > Occasionally getting the following error while using Spark R while > constructing dataframe in R > 16/02/09 13:28:06 WARN RBackendHandler: cannot find matching method class > org.apache.spark.sql.api.r.SQLUtils.dfToCols. Candidates are: > Error in as.vector(x, "list") : > cannot coerce type 'environment' to vector of type 'list' > Restarting SparkR fixed the error. > What is the cause for this issue? How can we solve it? > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13278) Launcher fails to start with JDK 9 EA
[ https://issues.apache.org/jira/browse/SPARK-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141868#comment-15141868 ] Apache Spark commented on SPARK-13278: -- User 'cl4es' has created a pull request for this issue: https://github.com/apache/spark/pull/11160 > Launcher fails to start with JDK 9 EA > - > > Key: SPARK-13278 > URL: https://issues.apache.org/jira/browse/SPARK-13278 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Claes Redestad > > CommandBuilderUtils.addPermGenSizeOpt need to handle the JDK 9 version string > format, which can look like the expected 9, but also like 9-ea and 9+100 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13278) Launcher fails to start with JDK 9 EA
[ https://issues.apache.org/jira/browse/SPARK-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13278: Assignee: (was: Apache Spark) > Launcher fails to start with JDK 9 EA > - > > Key: SPARK-13278 > URL: https://issues.apache.org/jira/browse/SPARK-13278 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Claes Redestad > > CommandBuilderUtils.addPermGenSizeOpt need to handle the JDK 9 version string > format, which can look like the expected 9, but also like 9-ea and 9+100 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13278) Launcher fails to start with JDK 9 EA
[ https://issues.apache.org/jira/browse/SPARK-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13278: Assignee: Apache Spark > Launcher fails to start with JDK 9 EA > - > > Key: SPARK-13278 > URL: https://issues.apache.org/jira/browse/SPARK-13278 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Claes Redestad >Assignee: Apache Spark > > CommandBuilderUtils.addPermGenSizeOpt need to handle the JDK 9 version string > format, which can look like the expected 9, but also like 9-ea and 9+100 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9438) restarting leader zookeeper causes spark master to die when the spark master election is assigned to zookeeper
[ https://issues.apache.org/jira/browse/SPARK-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141835#comment-15141835 ] Thomas Demoor commented on SPARK-9438: -- We have witnessed this as well in 1.3. Losing the ZK leader takes down the active spark master. > restarting leader zookeeper causes spark master to die when the spark master > election is assigned to zookeeper > -- > > Key: SPARK-9438 > URL: https://issues.apache.org/jira/browse/SPARK-9438 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 > Environment: Saprk 1.2.0 and Zookeeper version: 3.4.6-1569965 >Reporter: Amir Rad > > When Spark Master Election is assigned to Zookeeper, restarting the leader > Zookeeper causes the master spark to die. > Steps to reproduce: > create a cluster of 3 spark nodes. > set Spark-env to: > SPARK_LOCAL_DIRS="/home/sparkcde/data_spark/data" > SPARK_MASTER_OPTS="-Dspark.deploy.spreadOut=false" > SPARK_WORKER_DIR="/home/sparkcde/data_spark/worker" > SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true" > SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER > -Dspark.deploy.zookeeper.url=s1:2181,s2:2181,s3:2181" > Identify the spark master > identify the zookeeper leader. > Stop zookeeper leader > check spark master: It is dead > start zookeeper leader > check spark master: still dead > If you continue the same pattern of stopping and starting zookeeper leader, > eventually you will lose the whole spark cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13278) Launcher fails to start with JDK 9 EA
Claes Redestad created SPARK-13278: -- Summary: Launcher fails to start with JDK 9 EA Key: SPARK-13278 URL: https://issues.apache.org/jira/browse/SPARK-13278 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.6.0 Reporter: Claes Redestad CommandBuilderUtils.addPermGenSizeOpt need to handle the JDK 9 version string format, which can look like the expected 9, but also like 9-ea and 9+100 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13056) Map column would throw NPE if value is null
[ https://issues.apache.org/jira/browse/SPARK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-13056: - Assignee: Adrian Wang > Map column would throw NPE if value is null > --- > > Key: SPARK-13056 > URL: https://issues.apache.org/jira/browse/SPARK-13056 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Adrian Wang >Assignee: Adrian Wang > Fix For: 1.6.1, 2.0.0 > > > Create a map like > { "a": "somestring", > "b": null} > Query like > SELECT col["b"] FROM t1; > NPE would be thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13056) Map column would throw NPE if value is null
[ https://issues.apache.org/jira/browse/SPARK-13056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-13056. -- Resolution: Fixed Fix Version/s: 2.0.0 1.6.1 [~marmbrus] [~adrian-wang] Looks like the jira wasn't updated when this was merged, I'm doing it manually now -- please update if I've made a mistake. Issue Resolved by pull request 10964 https://github.com/apache/spark/pull/10964 > Map column would throw NPE if value is null > --- > > Key: SPARK-13056 > URL: https://issues.apache.org/jira/browse/SPARK-13056 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Adrian Wang > Fix For: 1.6.1, 2.0.0 > > > Create a map like > { "a": "somestring", > "b": null} > Query like > SELECT col["b"] FROM t1; > NPE would be thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3789) [GRAPHX] Python bindings for GraphX
[ https://issues.apache.org/jira/browse/SPARK-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141761#comment-15141761 ] Ignacio tartavull commented on SPARK-3789: -- Is there any update in the status of this ticket? > [GRAPHX] Python bindings for GraphX > --- > > Key: SPARK-3789 > URL: https://issues.apache.org/jira/browse/SPARK-3789 > Project: Spark > Issue Type: New Feature > Components: GraphX, PySpark >Reporter: Ameet Talwalkar >Assignee: Kushal Datta > Attachments: PyGraphX_design_doc.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13277) ANTLR ignores other rule using the USING keyword
Herman van Hovell created SPARK-13277: - Summary: ANTLR ignores other rule using the USING keyword Key: SPARK-13277 URL: https://issues.apache.org/jira/browse/SPARK-13277 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Herman van Hovell Priority: Minor ANTLR currently emits the following warning during compilation: {noformat} warning(200): org/apache/spark/sql/catalyst/parser/SparkSqlParser.g:938:7: Decision can match input such as "KW_USING Identifier" using multiple alternatives: 2, 3 As a result, alternative(s) 3 were disabled for that input {noformat} This means that some of the functionality of the parser is disabled. This is introduced by the migration of the DDLParsers (https://github.com/apache/spark/pull/10723). We should be able to fix this by introducing a syntactic predicate for USING. cc [~viirya] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13276) Parse Table Identifiers/Expression skips bad characters at the end of the passed string
[ https://issues.apache.org/jira/browse/SPARK-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13276: Assignee: (was: Apache Spark) > Parse Table Identifiers/Expression skips bad characters at the end of the > passed string > --- > > Key: SPARK-13276 > URL: https://issues.apache.org/jira/browse/SPARK-13276 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Priority: Minor > > Both the ParseDriver.parseTableName/parseExpression methods currently allow > the passed command to end with any kind of (bad) characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13276) Parse Table Identifiers/Expression skips bad characters at the end of the passed string
[ https://issues.apache.org/jira/browse/SPARK-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13276: Assignee: Apache Spark > Parse Table Identifiers/Expression skips bad characters at the end of the > passed string > --- > > Key: SPARK-13276 > URL: https://issues.apache.org/jira/browse/SPARK-13276 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Assignee: Apache Spark >Priority: Minor > > Both the ParseDriver.parseTableName/parseExpression methods currently allow > the passed command to end with any kind of (bad) characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13276) Parse Table Identifiers/Expression skips bad characters at the end of the passed string
[ https://issues.apache.org/jira/browse/SPARK-13276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141733#comment-15141733 ] Apache Spark commented on SPARK-13276: -- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/11159 > Parse Table Identifiers/Expression skips bad characters at the end of the > passed string > --- > > Key: SPARK-13276 > URL: https://issues.apache.org/jira/browse/SPARK-13276 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Herman van Hovell >Priority: Minor > > Both the ParseDriver.parseTableName/parseExpression methods currently allow > the passed command to end with any kind of (bad) characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13057) Add benchmark codes and the performance results for implemented compression schemes for InMemoryRelation
[ https://issues.apache.org/jira/browse/SPARK-13057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13057. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.0.0 > Add benchmark codes and the performance results for implemented compression > schemes for InMemoryRelation > > > Key: SPARK-13057 > URL: https://issues.apache.org/jira/browse/SPARK-13057 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Takeshi Yamamuro >Assignee: Takeshi Yamamuro > Fix For: 2.0.0 > > > This ticket adds benchmark codes for in-memory cache compression to make > future developments and discussions more smooth. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12414) Remove closure serializer
[ https://issues.apache.org/jira/browse/SPARK-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-12414. - Resolution: Fixed Assignee: Sean Owen (was: Andrew Or) Fix Version/s: 2.0.0 > Remove closure serializer > - > > Key: SPARK-12414 > URL: https://issues.apache.org/jira/browse/SPARK-12414 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Sean Owen > Fix For: 2.0.0 > > > There is a config `spark.closure.serializer` that accepts exactly one value: > the java serializer. This is because there are currently bugs in the Kryo > serializer that make it not a viable candidate. This was uncovered by an > unsuccessful attempt to make it work: SPARK-7708. > My high level point is that the Java serializer has worked well for at least > 6 Spark versions now, and it is an incredibly complicated task to get other > serializers (not just Kryo) to work with Spark's closures. IMO the effort is > not worth it and we should just remove this documentation and all the code > associated with it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13276) Parse Table Identifiers/Expression skips bad characters at the end of the passed string
Herman van Hovell created SPARK-13276: - Summary: Parse Table Identifiers/Expression skips bad characters at the end of the passed string Key: SPARK-13276 URL: https://issues.apache.org/jira/browse/SPARK-13276 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Herman van Hovell Priority: Minor Both the ParseDriver.parseTableName/parseExpression methods currently allow the passed command to end with any kind of (bad) characters. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141656#comment-15141656 ] Apache Spark commented on SPARK-13274: -- User 'raelawang' has created a pull request for this issue: https://github.com/apache/spark/pull/11158 > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Priority: Trivial > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11714) Make Spark on Mesos honor port restrictions
[ https://issues.apache.org/jira/browse/SPARK-11714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141653#comment-15141653 ] Apache Spark commented on SPARK-11714: -- User 'skonto' has created a pull request for this issue: https://github.com/apache/spark/pull/11157 > Make Spark on Mesos honor port restrictions > --- > > Key: SPARK-11714 > URL: https://issues.apache.org/jira/browse/SPARK-11714 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Charles Allen > > Currently the MesosSchedulerBackend does not make any effort to honor "ports" > as a resource offer in Mesos. This ask is to have the ports which the > executor binds to honor the limits of the "ports" resource of an offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13266) Python DataFrameReader converts None to "None" instead of null
[ https://issues.apache.org/jira/browse/SPARK-13266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141619#comment-15141619 ] Shixiong Zhu commented on SPARK-13266: -- Could you submit a PR? > Python DataFrameReader converts None to "None" instead of null > -- > > Key: SPARK-13266 > URL: https://issues.apache.org/jira/browse/SPARK-13266 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 1.6.0 > Environment: Linux standalone but probably applies to all >Reporter: mathieu longtin > Labels: easyfix, patch > > If you do something like this: > {code:none} > tsv_loader = sqlContext.read.format('com.databricks.spark.csv') > tsv_loader.options(quote=None, escape=None) > {code} > The loader sees the string "None" as the _quote_ and _escape_ options. The > loader should get a _null_. > An easy fix is to modify *python/pyspark/sql/readwriter.py* near the top, > correct the _to_str_ function. Here's the patch: > {code:none} > diff --git a/python/pyspark/sql/readwriter.py > b/python/pyspark/sql/readwriter.py > index a3d7eca..ba18d13 100644 > --- a/python/pyspark/sql/readwriter.py > +++ b/python/pyspark/sql/readwriter.py > @@ -33,10 +33,12 @@ __all__ = ["DataFrameReader", "DataFrameWriter"] > def to_str(value): > """ > -A wrapper over str(), but convert bool values to lower case string > +A wrapper over str(), but convert bool values to lower case string, and > keep None > """ > if isinstance(value, bool): > return str(value).lower() > +elif value is None: > +return value > else: > return str(value) > {code} > This has been tested and works great. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reassigned SPARK-13271: Assignee: Shixiong Zhu > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13271: - Component/s: SQL > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Shixiong Zhu > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13271: - Issue Type: Improvement (was: Bug) > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Shixiong Zhu > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141607#comment-15141607 ] Apache Spark commented on SPARK-13274: -- User 'raelawang' has created a pull request for this issue: https://github.com/apache/spark/pull/11156 > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Priority: Trivial > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13274: Assignee: Apache Spark > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Assignee: Apache Spark >Priority: Trivial > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
[ https://issues.apache.org/jira/browse/SPARK-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13274: Assignee: (was: Apache Spark) > Fix Aggregator Links on GroupedDataset Scala API > - > > Key: SPARK-13274 > URL: https://issues.apache.org/jira/browse/SPARK-13274 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: Raela Wang >Priority: Trivial > > Update Scala API docs for GroupedDataset. Links in flatMapGroups() and > mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13275) With dynamic allocation, executors appear to be added before job starts
[ https://issues.apache.org/jira/browse/SPARK-13275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephanie Bodoff updated SPARK-13275: - Attachment: webui.png > With dynamic allocation, executors appear to be added before job starts > --- > > Key: SPARK-13275 > URL: https://issues.apache.org/jira/browse/SPARK-13275 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.5.0 >Reporter: Stephanie Bodoff >Priority: Minor > Attachments: webui.png > > > When I look at the timeline in the Spark Web UI I see the job starting and > then executors being added. The blue lines and dots hitting the timeline show > that the executors were added after the job started. But the way the Executor > box is rendered it looks like the executors started before the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13275) With dynamic allocation, executors appear to be added before job starts
Stephanie Bodoff created SPARK-13275: Summary: With dynamic allocation, executors appear to be added before job starts Key: SPARK-13275 URL: https://issues.apache.org/jira/browse/SPARK-13275 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.5.0 Reporter: Stephanie Bodoff Priority: Minor Attachments: webui.png When I look at the timeline in the Spark Web UI I see the job starting and then executors being added. The blue lines and dots hitting the timeline show that the executors were added after the job started. But the way the Executor box is rendered it looks like the executors started before the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13126. --- Resolution: Fixed Fix Version/s: 2.0.0 > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11416) Upgrade kryo package to version 3.0
[ https://issues.apache.org/jira/browse/SPARK-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141589#comment-15141589 ] Oscar Boykin commented on SPARK-11416: -- Related issue: https://issues.apache.org/jira/browse/STORM-1537 > Upgrade kryo package to version 3.0 > --- > > Key: SPARK-11416 > URL: https://issues.apache.org/jira/browse/SPARK-11416 > Project: Spark > Issue Type: Wish > Components: Build >Affects Versions: 1.5.1 >Reporter: Hitoshi Ozawa > > Would like to have Apache Spark upgrade kryo package from 2.x (current) to > 3.x. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13126: -- Assignee: Zhuo Liu > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Zhuo Liu >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13274) Fix Aggregator Links on GroupedDataset Scala API
Raela Wang created SPARK-13274: -- Summary: Fix Aggregator Links on GroupedDataset Scala API Key: SPARK-13274 URL: https://issues.apache.org/jira/browse/SPARK-13274 Project: Spark Issue Type: Documentation Components: Documentation Reporter: Raela Wang Priority: Trivial Update Scala API docs for GroupedDataset. Links in flatMapGroups() and mapGroups() are pointing to the wrong Aggregator. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13273) Improve test coverage of CatalystQl
Herman van Hovell created SPARK-13273: - Summary: Improve test coverage of CatalystQl Key: SPARK-13273 URL: https://issues.apache.org/jira/browse/SPARK-13273 Project: Spark Issue Type: Improvement Components: SQL Reporter: Herman van Hovell The current CatalystQl tests are quite basic and are far from complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13272) Clean-up CatalystQl
Herman van Hovell created SPARK-13272: - Summary: Clean-up CatalystQl Key: SPARK-13272 URL: https://issues.apache.org/jira/browse/SPARK-13272 Project: Spark Issue Type: Improvement Components: SQL Reporter: Herman van Hovell We still have some technical debt in CatalystQl: * It should be placed in the parser package. * Most of the methods are lacking proper documentation. * Some code (regexes) could be moved into an Object. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13270) Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan
[ https://issues.apache.org/jira/browse/SPARK-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13270: Assignee: Apache Spark > Improve readability of whole stage codegen by skipping empty lines and > outputting the pipeline plan > --- > > Key: SPARK-13270 > URL: https://issues.apache.org/jira/browse/SPARK-13270 > Project: Spark > Issue Type: Bug >Reporter: Nong Li >Assignee: Apache Spark > > It would be nice to comment the generated function with the pipeline it is > for, particularly for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13270) Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan
[ https://issues.apache.org/jira/browse/SPARK-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141571#comment-15141571 ] Apache Spark commented on SPARK-13270: -- User 'nongli' has created a pull request for this issue: https://github.com/apache/spark/pull/11155 > Improve readability of whole stage codegen by skipping empty lines and > outputting the pipeline plan > --- > > Key: SPARK-13270 > URL: https://issues.apache.org/jira/browse/SPARK-13270 > Project: Spark > Issue Type: Bug >Reporter: Nong Li > > It would be nice to comment the generated function with the pipeline it is > for, particularly for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13270) Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan
[ https://issues.apache.org/jira/browse/SPARK-13270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13270: Assignee: (was: Apache Spark) > Improve readability of whole stage codegen by skipping empty lines and > outputting the pipeline plan > --- > > Key: SPARK-13270 > URL: https://issues.apache.org/jira/browse/SPARK-13270 > Project: Spark > Issue Type: Bug >Reporter: Nong Li > > It would be nice to comment the generated function with the pipeline it is > for, particularly for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13163) Column width on new History Server DataTables not getting set correctly
[ https://issues.apache.org/jira/browse/SPARK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13163. --- Resolution: Fixed Fix Version/s: 2.0.0 > Column width on new History Server DataTables not getting set correctly > --- > > Key: SPARK-13163 > URL: https://issues.apache.org/jira/browse/SPARK-13163 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Priority: Minor > Fix For: 2.0.0 > > Attachments: page_width_fixed.png, width_long_name.png > > > The column width on the DataTable UI for the History Server is being set for > all entries in the table not just the current page. This means if there is > even one App with a long name in your history the table will look really odd > as seen below. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141552#comment-15141552 ] Apache Spark commented on SPARK-13271: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/11154 > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Bug >Reporter: Shixiong Zhu > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13271: Assignee: (was: Apache Spark) > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Bug >Reporter: Shixiong Zhu > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13271) Better error message if 'path' is not specified
[ https://issues.apache.org/jira/browse/SPARK-13271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13271: Assignee: Apache Spark > Better error message if 'path' is not specified > --- > > Key: SPARK-13271 > URL: https://issues.apache.org/jira/browse/SPARK-13271 > Project: Spark > Issue Type: Bug >Reporter: Shixiong Zhu >Assignee: Apache Spark > > As per discussion in > https://github.com/apache/spark/pull/11034#discussion_r52111238 > we should improve the error message: > {code} > scala> sqlContext.read.format("text").load() > java.util.NoSuchElementException: key not found: path > at scala.collection.MapLike$class.default(MapLike.scala:228) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) > at scala.collection.MapLike$class.apply(MapLike.scala:141) > at > org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) > at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) > ... 49 elided > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13271) Better error message if 'path' is not specified
Shixiong Zhu created SPARK-13271: Summary: Better error message if 'path' is not specified Key: SPARK-13271 URL: https://issues.apache.org/jira/browse/SPARK-13271 Project: Spark Issue Type: Bug Reporter: Shixiong Zhu As per discussion in https://github.com/apache/spark/pull/11034#discussion_r52111238 we should improve the error message: {code} scala> sqlContext.read.format("text").load() java.util.NoSuchElementException: key not found: path at scala.collection.MapLike$class.default(MapLike.scala:228) at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.default(ddl.scala:159) at scala.collection.MapLike$class.apply(MapLike.scala:141) at org.apache.spark.sql.execution.datasources.CaseInsensitiveMap.apply(ddl.scala:159) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$$anonfun$10.apply(ResolvedDataSource.scala:200) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:200) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:129) ... 49 elided {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13054) Always post TaskEnd event for tasks in cancelled stages
[ https://issues.apache.org/jira/browse/SPARK-13054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141520#comment-15141520 ] Apache Spark commented on SPARK-13054: -- User 'tgravescs' has created a pull request for this issue: https://github.com/apache/spark/pull/10951 > Always post TaskEnd event for tasks in cancelled stages > --- > > Key: SPARK-13054 > URL: https://issues.apache.org/jira/browse/SPARK-13054 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > {code} > // The success case is dealt with separately below. > // TODO: Why post it only for failed tasks in cancelled stages? Clarify > semantics here. > if (event.reason != Success) { > val attemptId = task.stageAttemptId > listenerBus.post(SparkListenerTaskEnd( > stageId, attemptId, taskType, event.reason, event.taskInfo, > taskMetrics)) > } > {code} > Today we only post task end events for canceled stages if the task failed. > There is no reason why we shouldn't just post it for all the tasks, including > the ones that succeeded. If we do that we will be able to simplify another > branch in the DAGScheduler, which needs a lot of simplification. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13270) Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan
Nong Li created SPARK-13270: --- Summary: Improve readability of whole stage codegen by skipping empty lines and outputting the pipeline plan Key: SPARK-13270 URL: https://issues.apache.org/jira/browse/SPARK-13270 Project: Spark Issue Type: Bug Reporter: Nong Li It would be nice to comment the generated function with the pipeline it is for, particularly for complex queries. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13269) Expose more executor stats in stable status API
Andrew Or created SPARK-13269: - Summary: Expose more executor stats in stable status API Key: SPARK-13269 URL: https://issues.apache.org/jira/browse/SPARK-13269 Project: Spark Issue Type: Bug Components: Spark Core Reporter: Andrew Or Currently the stable status API is quite limited; it exposes only a small subset of the things exposed by JobProgressListener. It is useful for very high level querying but falls short when the developer wants to build an application on top of Spark with more integration. In this issue I propose that we expose at least two things: - Which executors are running tasks, and - Which executors cached how much in memory and on disk The goal is not to expose exactly these two things, but to expose something that would allow the developer to learn about them. These concepts are very much fundamental in Spark's design so there's almost no chance that they will go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13269) Expose more executor stats in stable status API
[ https://issues.apache.org/jira/browse/SPARK-13269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13269: -- Issue Type: Improvement (was: Bug) > Expose more executor stats in stable status API > --- > > Key: SPARK-13269 > URL: https://issues.apache.org/jira/browse/SPARK-13269 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Andrew Or > > Currently the stable status API is quite limited; it exposes only a small > subset of the things exposed by JobProgressListener. It is useful for very > high level querying but falls short when the developer wants to build an > application on top of Spark with more integration. > In this issue I propose that we expose at least two things: > - Which executors are running tasks, and > - Which executors cached how much in memory and on disk > The goal is not to expose exactly these two things, but to expose something > that would allow the developer to learn about them. These concepts are very > much fundamental in Spark's design so there's almost no chance that they will > go away in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13254) Fix planning of TakeOrderedAndProject operator
[ https://issues.apache.org/jira/browse/SPARK-13254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-13254. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11145 [https://github.com/apache/spark/pull/11145] > Fix planning of TakeOrderedAndProject operator > -- > > Key: SPARK-13254 > URL: https://issues.apache.org/jira/browse/SPARK-13254 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > The patch for SPARK-8964 ("use Exchange to perform shuffle in Limit") > inadvertently broke the planning of the TakeOrderedAndProject operator: > because ReturnAnswer was the new root of the query plan, the > TakeOrderedAndProject rule was unable to match before BasicOperators. We > should fix this by moving all rules that match on ReturnAnswer to run at the > start of the physical planning process. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or resolved SPARK-5095. -- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 > Support launching multiple mesos executors in coarse grained mesos mode > --- > > Key: SPARK-5095 > URL: https://issues.apache.org/jira/browse/SPARK-5095 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.0.0 >Reporter: Timothy Chen >Assignee: Timothy Chen > Fix For: 2.0.0 > > > Currently in coarse grained mesos mode, it's expected that we only launch one > Mesos executor that launches one JVM process to launch multiple spark > executors. > However, this become a problem when the JVM process launched is larger than > an ideal size (30gb is recommended value from databricks), which causes GC > problems reported on the mailing list. > We should support launching mulitple executors when large enough resources > are available for spark to use, and these resources are still under the > configured limit. > This is also applicable when users want to specifiy number of executors to be > launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13174) Add API and options for csv data sources
[ https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141438#comment-15141438 ] Davies Liu commented on SPARK-13174: [~GayathriMurali] Yes, there is a way, but it's not as good as other builtin datasources (like parquet, json, jdbc) > Add API and options for csv data sources > > > Key: SPARK-13174 > URL: https://issues.apache.org/jira/browse/SPARK-13174 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Reporter: Davies Liu > > We should have a API to load csv data source (with some options as > arguments), similar to json() and jdbc() -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12705) Sorting column can't be resolved if it's not in projection
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12705: Assignee: Apache Spark (was: Davies Liu) > Sorting column can't be resolved if it's not in projection > -- > > Key: SPARK-12705 > URL: https://issues.apache.org/jira/browse/SPARK-12705 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > Fix For: 2.0.0 > > > The following query can't be resolved: > {code} > scala> sqlContext.sql("select sum(a) over () from (select 1 as a, 2 as b) t > order by b").explain() > org.apache.spark.sql.AnalysisException: cannot resolve 'b' given input > columns: [_c0]; line 1 pos 63 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:282) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:322) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12705) Sorting column can't be resolved if it's not in projection
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12705: Assignee: Davies Liu (was: Apache Spark) > Sorting column can't be resolved if it's not in projection > -- > > Key: SPARK-12705 > URL: https://issues.apache.org/jira/browse/SPARK-12705 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > The following query can't be resolved: > {code} > scala> sqlContext.sql("select sum(a) over () from (select 1 as a, 2 as b) t > order by b").explain() > org.apache.spark.sql.AnalysisException: cannot resolve 'b' given input > columns: [_c0]; line 1 pos 63 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:282) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:322) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12705) Sorting column can't be resolved if it's not in projection
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141387#comment-15141387 ] Apache Spark commented on SPARK-12705: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11153 > Sorting column can't be resolved if it's not in projection > -- > > Key: SPARK-12705 > URL: https://issues.apache.org/jira/browse/SPARK-12705 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > The following query can't be resolved: > {code} > scala> sqlContext.sql("select sum(a) over () from (select 1 as a, 2 as b) t > order by b").explain() > org.apache.spark.sql.AnalysisException: cannot resolve 'b' given input > columns: [_c0]; line 1 pos 63 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:282) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:322) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-12705) Sorting column can't be resolved if it's not in projection
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reopened SPARK-12705: Assignee: Davies Liu (was: Xiao Li) The Q98 is still can't be analyzed, I will send a PR to fix that. > Sorting column can't be resolved if it's not in projection > -- > > Key: SPARK-12705 > URL: https://issues.apache.org/jira/browse/SPARK-12705 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > The following query can't be resolved: > {code} > scala> sqlContext.sql("select sum(a) over () from (select 1 as a, 2 as b) t > order by b").explain() > org.apache.spark.sql.AnalysisException: cannot resolve 'b' given input > columns: [_c0]; line 1 pos 63 > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:336) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:335) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:282) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:322) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:333) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:109) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:119) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:123) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13061) Error in spark rest api application info for job names contains spaces
[ https://issues.apache.org/jira/browse/SPARK-13061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141340#comment-15141340 ] Devaraj K commented on SPARK-13061: --- You have mentioned the id as 'Spark shell' in the issue description, I don't think that is the way the API returns. {code:xml} http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ returns: [ { "id" : "Spark shell", "name" : "Spark shell", {code} If we are requesting HTTP server with some URL which is having spaces using any browser or any other client then the browser/client would encode the URL(as part of encoding it replaces spaces with %20) before sending the request to the HTTP Server. This is happening when you are passing id as "Spark shell". {code:xml}/applications/[app-id]/jobs/[job-id] Details for the given job{code} I think you need to pass the job-id if you want to get details for the specific job not the name. > Error in spark rest api application info for job names contains spaces > -- > > Key: SPARK-13061 > URL: https://issues.apache.org/jira/browse/SPARK-13061 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 >Reporter: Avihoo Mamka >Priority: Trivial > Labels: rest_api, spark > > When accessing spark rest api with application id to get job specific id > status, a job with name containing whitespaces are being encoded to '%20' and > therefore the rest api returns `no such app`. > For example: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/ > returns: > [ { > "id" : "Spark shell", > "name" : "Spark shell", > "attempts" : [ { > "startTime" : "2016-01-28T09:20:58.526GMT", > "endTime" : "1969-12-31T23:59:59.999GMT", > "sparkUser" : "", > "completed" : false > } ] > } ] > and then when accessing: > http://spark.mysite.com:20888/proxy/application_1447676402999_1254/api/v1/applications/Spark > shell/ > the result returned is: > unknown app: Spark%20shell -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13253) Error aliasing array columns.
[ https://issues.apache.org/jira/browse/SPARK-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141252#comment-15141252 ] kevin yu commented on SPARK-13253: -- I can recreate the problem, I am looking at it now > Error aliasing array columns. > - > > Key: SPARK-13253 > URL: https://issues.apache.org/jira/browse/SPARK-13253 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Rakesh Chalasani > > Getting an "UnsupportedOperationException" when trying to alias an > array column. > The issue seems over "toString" on Column. "CreateArray" expression -> > dataType, which checks for nullability of its children, while aliasing is > creating a PrettyAttribute that does not implement nullability. > Code to reproduce the error: > {code} > import org.apache.spark.sql.SQLContext > val sqlContext = new SQLContext(sparkContext) > import sqlContext.implicits._ > import org.apache.spark.sql.functions > case class Test(a:Int, b:Int) > val data = sparkContext.parallelize(Array.range(0, 10).map(x => Test(x, > x+1))) > val df = data.toDF() > val arrayCol = functions.array(df("a"), df("b")).as("arrayCol") > arrayCol.toString() > {code} > Error message: > {code} > java.lang.UnsupportedOperationException > at > org.apache.spark.sql.catalyst.expressions.PrettyAttribute.nullable(namedExpressions.scala:289) > at > org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40) > at > org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40) > at > scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40) > at > scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40) > at > scala.collection.IndexedSeqOptimized$class.segmentLength(IndexedSeqOptimized.scala:189) > at > scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:47) > at scala.collection.GenSeqLike$class.prefixLength(GenSeqLike.scala:92) > at scala.collection.AbstractSeq.prefixLength(Seq.scala:40) > at > scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:40) > at scala.collection.mutable.ArrayBuffer.exists(ArrayBuffer.scala:47) > at > org.apache.spark.sql.catalyst.expressions.CreateArray.dataType(complexTypeCreator.scala:40) > at > org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:136) > at > org.apache.spark.sql.catalyst.expressions.NamedExpression$class.typeSuffix(namedExpressions.scala:84) > at > org.apache.spark.sql.catalyst.expressions.Alias.typeSuffix(namedExpressions.scala:120) > at > org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155) > at > org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:207) > at org.apache.spark.sql.Column.toString(Column.scala:138) > at java.lang.String.valueOf(String.java:2994) > at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331) > at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337) > at .(:20) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources
[ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141243#comment-15141243 ] Evan Chan commented on SPARK-12449: --- [~rxin] I agree with [~stephank85] and others that this would be a huge help. At the very least, if the expressions could be pushed down that would help a lot. Many databases are doing custom work to get the pushdowns needed, and I was thinking of doing something very similar and was going to propose something just like this. > Pushing down arbitrary logical plans to data sources > > > Key: SPARK-12449 > URL: https://issues.apache.org/jira/browse/SPARK-12449 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Stephan Kessler > Attachments: pushingDownLogicalPlans.pdf > > > With the help of the DataSource API we can pull data from external sources > for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows > to push down filters and projects pruning unnecessary fields and rows > directly in the data source. > However, data sources such as SQL Engines are capable of doing even more > preprocessing, e.g., evaluating aggregates. This is beneficial because it > would reduce the amount of data transferred from the source to Spark. The > existing interfaces do not allow such kind of processing in the source. > We would propose to add a new interface {{CatalystSource}} that allows to > defer the processing of arbitrary logical plans to the data source. We have > already shown the details at the Spark Summit 2015 Europe > [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/] > I will add a design document explaining details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13267) Document ?params for the v1 REST API
[ https://issues.apache.org/jira/browse/SPARK-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141104#comment-15141104 ] Apache Spark commented on SPARK-13267: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/11152 > Document ?params for the v1 REST API > > > Key: SPARK-13267 > URL: https://issues.apache.org/jira/browse/SPARK-13267 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Steve Loughran >Priority: Minor > > There's some various ? param options in the v1 rest API, which don't get any > mention except in the HistoryServerSuite. They should be documented in > monitoring.md -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13267) Document ?params for the v1 REST API
[ https://issues.apache.org/jira/browse/SPARK-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13267: Assignee: (was: Apache Spark) > Document ?params for the v1 REST API > > > Key: SPARK-13267 > URL: https://issues.apache.org/jira/browse/SPARK-13267 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Steve Loughran >Priority: Minor > > There's some various ? param options in the v1 rest API, which don't get any > mention except in the HistoryServerSuite. They should be documented in > monitoring.md -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13267) Document ?params for the v1 REST API
[ https://issues.apache.org/jira/browse/SPARK-13267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13267: Assignee: Apache Spark > Document ?params for the v1 REST API > > > Key: SPARK-13267 > URL: https://issues.apache.org/jira/browse/SPARK-13267 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Steve Loughran >Assignee: Apache Spark >Priority: Minor > > There's some various ? param options in the v1 rest API, which don't get any > mention except in the HistoryServerSuite. They should be documented in > monitoring.md -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00
[ https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Ganelin updated SPARK-13268: - Description: There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset: {code} new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 {code} This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day. The suggested fix for this is to update the timestamp toString representation to either a) Include timezone or b) Correctly display in GMT. This change may well introduce substantial and insidious bugs so I'm not sure how best to resolve this. was: There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset: {code} new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 {code} This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day. > SQL Timestamp stored as GMT but toString returns GMT-08:00 > -- > > Key: SPARK-13268 > URL: https://issues.apache.org/jira/browse/SPARK-13268 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Ilya Ganelin > > There is an issue with how timestamps are displayed/converted to Strings in > Spark SQL. The documentation states that the timestamp should be created in > the GMT time zone, however, if we do so, we see that the output actually > contains a -8 hour offset: > {code} > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) > res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) > res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 > {code} > This result is confusing, unintuitive, and introduces issues when converting > from DataFrames containing timestamps to RDDs which are then saved as text. > This has the effect of essentially shifting all dates in a dataset by 1 day. > The suggested fix for this is to update the timestamp toString representation > to either a) Include timezone or b) Correctly display in GMT. > This change may well introduce substantial and insidious bugs so I'm not sure > how best to resolve this. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00
[ https://issues.apache.org/jira/browse/SPARK-13268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ilya Ganelin updated SPARK-13268: - Description: There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset: {code} new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 {code} This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day. was: There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset: {{ new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 }} This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day. > SQL Timestamp stored as GMT but toString returns GMT-08:00 > -- > > Key: SPARK-13268 > URL: https://issues.apache.org/jira/browse/SPARK-13268 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Ilya Ganelin > > There is an issue with how timestamps are displayed/converted to Strings in > Spark SQL. The documentation states that the timestamp should be created in > the GMT time zone, however, if we do so, we see that the output actually > contains a -8 hour offset: > {code} > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) > res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 > new > Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) > res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 > {code} > This result is confusing, unintuitive, and introduces issues when converting > from DataFrames containing timestamps to RDDs which are then saved as text. > This has the effect of essentially shifting all dates in a dataset by 1 day. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13268) SQL Timestamp stored as GMT but toString returns GMT-08:00
Ilya Ganelin created SPARK-13268: Summary: SQL Timestamp stored as GMT but toString returns GMT-08:00 Key: SPARK-13268 URL: https://issues.apache.org/jira/browse/SPARK-13268 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.0 Reporter: Ilya Ganelin There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset: {{ new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli) res144: java.sql.Timestamp = 2014-12-31 16:00:00.0 new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli) res145: java.sql.Timestamp = 2015-01-01 00:00:00.0 }} This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13267) Document ?params for the v1 REST API
Steve Loughran created SPARK-13267: -- Summary: Document ?params for the v1 REST API Key: SPARK-13267 URL: https://issues.apache.org/jira/browse/SPARK-13267 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 1.6.0 Reporter: Steve Loughran Priority: Minor There's some various ? param options in the v1 rest API, which don't get any mention except in the HistoryServerSuite. They should be documented in monitoring.md -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11085) Add support for HTTP proxy
[ https://issues.apache.org/jira/browse/SPARK-11085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140906#comment-15140906 ] Prosper Burq commented on SPARK-11085: -- Hi, Is this problem still unresolved ? I tried several option but could not find out to allows spark-submit to connect through the proxy. I tried to pass environment variable through different ways but none of them worked. > Add support for HTTP proxy > --- > > Key: SPARK-11085 > URL: https://issues.apache.org/jira/browse/SPARK-11085 > Project: Spark > Issue Type: Improvement > Components: Spark Shell, Spark Submit >Reporter: Dustin Cote >Priority: Minor > > Add a way to update ivysettings.xml for the spark-shell and spark-submit to > support proxy settings for clusters that need to access a remote repository > through an http proxy. Typically this would be done like: > JAVA_OPTS="$JAVA_OPTS -Dhttp.proxyHost=proxy.host -Dhttp.proxyPort=8080 > -Dhttps.proxyHost=proxy.host.secure -Dhttps.proxyPort=8080" > Directly in the ivysettings.xml would look like: > > proxyport="8080" > nonproxyhosts="nonproxy.host"/> > > Even better would be a way to customize the ivysettings.xml with command > options. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13266) Python DataFrameReader converts None to "None" instead of null
mathieu longtin created SPARK-13266: --- Summary: Python DataFrameReader converts None to "None" instead of null Key: SPARK-13266 URL: https://issues.apache.org/jira/browse/SPARK-13266 Project: Spark Issue Type: Bug Components: PySpark, SQL Affects Versions: 1.6.0 Environment: Linux standalone but probably applies to all Reporter: mathieu longtin If you do something like this: {code:none} tsv_loader = sqlContext.read.format('com.databricks.spark.csv') tsv_loader.options(quote=None, escape=None) {code} The loader sees the string "None" as the _quote_ and _escape_ options. The loader should get a _null_. An easy fix is to modify *python/pyspark/sql/readwriter.py* near the top, correct the _to_str_ function. Here's the patch: {code:none} diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index a3d7eca..ba18d13 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -33,10 +33,12 @@ __all__ = ["DataFrameReader", "DataFrameWriter"] def to_str(value): """ -A wrapper over str(), but convert bool values to lower case string +A wrapper over str(), but convert bool values to lower case string, and keep None """ if isinstance(value, bool): return str(value).lower() +elif value is None: +return value else: return str(value) {code} This has been tested and works great. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS
[ https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13265: Assignee: Apache Spark > Refactoring of basic ML import/export for other file system besides HDFS > > > Key: SPARK-13265 > URL: https://issues.apache.org/jira/browse/SPARK-13265 > Project: Spark > Issue Type: Bug > Components: ML >Reporter: Yu Ishikawa >Assignee: Apache Spark > > We can't save a model into other file system besides HDFS, for example Amazon > S3. Because the file system is fixed at Spark 1.6. > https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 > When I tried to export a KMeans model into Amazon S3, I got the error. > {noformat} > scala> val kmeans = new KMeans().setK(2) > scala> val model = kmeans.fit(train) > scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") > java.lang.IllegalArgumentException: Wrong FS: > s3n://test-bucket/tmp/test-kmeans, expected: > hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c > om:9000 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) > at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) > at $iwC$$iwC$$iwC$$iwC.(:47) > at $iwC$$iwC$$iwC.(:49) > at $iwC$$iwC.(:51) > at $iwC.(:53) > at (:55) > at .(:59) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) > at > org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) > at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) > at > org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) > at > org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) > at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) > at > org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) > at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) > at > scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) > at > org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) > at org.apache.spark.repl.Main$.main(Main.scala:31) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) > at org.ap