[jira] [Created] (SPARK-18299) Allow more aggregations on KeyValueGroupedDataset

2016-11-07 Thread Matthias Niehoff (JIRA)
Matthias Niehoff created SPARK-18299:


 Summary: Allow more aggregations on KeyValueGroupedDataset
 Key: SPARK-18299
 URL: https://issues.apache.org/jira/browse/SPARK-18299
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.1
Reporter: Matthias Niehoff


The number of possible aggregations on a KeyValueGroupedDataset created by 
groupByKey is limited to 4, as there are only methods with a maximum of 4 
parameters.

This value should be increased or - even better - made be completely unlimited.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14236) UDAF does not use incomingSchema for update Method

2016-03-29 Thread Matthias Niehoff (JIRA)
Matthias Niehoff created SPARK-14236:


 Summary: UDAF does not use incomingSchema for update Method
 Key: SPARK-14236
 URL: https://issues.apache.org/jira/browse/SPARK-14236
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.1
Reporter: Matthias Niehoff
Priority: Minor


When I specify a schema for the incoming data in an UDAF, the schema will not 
be applied to the incoming row in the update method. I can only access the 
fields using their numeric indices and not with their names. The Fields in the 
row are named input0, input1,...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode

2015-11-25 Thread Matthias Niehoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Niehoff updated SPARK-11782:
-
Priority: Minor  (was: Major)

> Master Web UI should link to correct Application UI in cluster mode
> ---
>
> Key: SPARK-11782
> URL: https://issues.apache.org/jira/browse/SPARK-11782
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
>Reporter: Matthias Niehoff
>Priority: Minor
>
> - Running a standalone cluster, with node1 as master
> - Submit an application to cluster with deploy-mode=cluster
> - Application driver is on node other than node1 (i.e. node3)
> => master WebUI links to node1:4040 for Application Detail UI and not to 
> node3:4040
> As the master knows on which worker the driver is running, it should be 
> possible to show the correct link to the Application Detail UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode

2015-11-23 Thread Matthias Niehoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15021730#comment-15021730
 ] 

Matthias Niehoff commented on SPARK-11782:
--

I submit the App with deploy-mode cluster, then the driver gets started inside 
the cluster. This could be any node then and does not necessarily have to be 
the node where spark-submit was executed.

> Master Web UI should link to correct Application UI in cluster mode
> ---
>
> Key: SPARK-11782
> URL: https://issues.apache.org/jira/browse/SPARK-11782
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
>Reporter: Matthias Niehoff
>
> - Running a standalone cluster, with node1 as master
> - Submit an application to cluster with deploy-mode=cluster
> - Application driver is on node other than node1 (i.e. node3)
> => master WebUI links to node1:4040 for Application Detail UI and not to 
> node3:4040
> As the master knows on which worker the driver is running, it should be 
> possible to show the correct link to the Application Detail UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode

2015-11-18 Thread Matthias Niehoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011192#comment-15011192
 ] 

Matthias Niehoff commented on SPARK-11782:
--

What I did:

- started master on node 1
- started a worker on node 1
- started a worker on node 2
- "spark-submit --deploy-mode cluster  " on node 1

On the master UI the Application Detail UI links contains an URL based on 
node1, but the driver is started on node2 (and the web-app is only reachable on 
the master?)

Hope this helps :-)

> Master Web UI should link to correct Application UI in cluster mode
> ---
>
> Key: SPARK-11782
> URL: https://issues.apache.org/jira/browse/SPARK-11782
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.4.1
>Reporter: Matthias Niehoff
>
> - Running a standalone cluster, with node1 as master
> - Submit an application to cluster with deploy-mode=cluster
> - Application driver is on node other than node1 (i.e. node3)
> => master WebUI links to node1:4040 for Application Detail UI and not to 
> node3:4040
> As the master knows on which worker the driver is running, it should be 
> possible to show the correct link to the Application Detail UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11782) Master Web UI should link to correct Application UI in cluster mode

2015-11-17 Thread Matthias Niehoff (JIRA)
Matthias Niehoff created SPARK-11782:


 Summary: Master Web UI should link to correct Application UI in 
cluster mode
 Key: SPARK-11782
 URL: https://issues.apache.org/jira/browse/SPARK-11782
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.1
Reporter: Matthias Niehoff


- Running a standalone cluster, with node1 as master
- Submit an application to cluster with deploy-mode=cluster
- Application driver is on node other than node1 (i.e. node3)

=> master WebUI links to node1:4040 for Application Detail UI and not to 
node3:4040

As the master knows on which worker the driver is running, it should be 
possible to show the correct link to the Application Detail UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4751) Support dynamic allocation for standalone mode

2015-10-26 Thread Matthias Niehoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974119#comment-14974119
 ] 

Matthias Niehoff edited comment on SPARK-4751 at 10/26/15 8:05 PM:
---

The PR is merged, but the documentation at 
https://spark.apache.org/docs/1.5.1/job-scheduling.html still says:
"This feature is currently disabled by default and available only on YARN."

Is the documentation just outdated or is it not yet available in 1.5.x?


was (Author: j4nu5):
The PR is merged, but the documentation at 
https://spark.apache.org/docs/1.5.1/job-scheduling.html still says:
"This feature is currently disabled by default and available only on YARN."

Is the documentation just outdated or is not yet available in 1.5.x?

> Support dynamic allocation for standalone mode
> --
>
> Key: SPARK-4751
> URL: https://issues.apache.org/jira/browse/SPARK-4751
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.5.0
>
>
> This is equivalent to SPARK-3822 but for standalone mode.
> This is actually a very tricky issue because the scheduling mechanism in the 
> standalone Master uses different semantics. In standalone mode we allocate 
> resources based on cores. By default, an application will grab all the cores 
> in the cluster unless "spark.cores.max" is specified. Unfortunately, this 
> means an application could get executors of different sizes (in terms of 
> cores) if:
> 1) App 1 kills an executor
> 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker
> 3) App 1 requests an executor
> In this case, the new executor that App 1 gets back will be smaller than the 
> rest and can execute fewer tasks in parallel. Further, standalone mode is 
> subject to the constraint that only one executor can be allocated on each 
> worker per application. As a result, it is rather meaningless to request new 
> executors if the existing ones are already spread out across all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4751) Support dynamic allocation for standalone mode

2015-10-26 Thread Matthias Niehoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974119#comment-14974119
 ] 

Matthias Niehoff commented on SPARK-4751:
-

The PR is merged, but the documentation at 
https://spark.apache.org/docs/1.5.1/job-scheduling.html still says:
"This feature is currently disabled by default and available only on YARN."

Is the documentation just outdated or is not yet available in 1.5.x?

> Support dynamic allocation for standalone mode
> --
>
> Key: SPARK-4751
> URL: https://issues.apache.org/jira/browse/SPARK-4751
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.5.0
>
>
> This is equivalent to SPARK-3822 but for standalone mode.
> This is actually a very tricky issue because the scheduling mechanism in the 
> standalone Master uses different semantics. In standalone mode we allocate 
> resources based on cores. By default, an application will grab all the cores 
> in the cluster unless "spark.cores.max" is specified. Unfortunately, this 
> means an application could get executors of different sizes (in terms of 
> cores) if:
> 1) App 1 kills an executor
> 2) App 2, with "spark.cores.max" set, grabs a subset of cores on a worker
> 3) App 1 requests an executor
> In this case, the new executor that App 1 gets back will be smaller than the 
> rest and can execute fewer tasks in parallel. Further, standalone mode is 
> subject to the constraint that only one executor can be allocated on each 
> worker per application. As a result, it is rather meaningless to request new 
> executors if the existing ones are already spread out across all nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org