[GitHub] flink pull request: [FLINK-3519] [core] Add warning about subclass...

2016-04-02 Thread ggevay
Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/1724#issuecomment-204682353
  
OK, no problem, I've updated the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3519) Subclasses of Tuples don't work if the declared type of a DataSet is not the descendant

2016-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222822#comment-15222822
 ] 

ASF GitHub Bot commented on FLINK-3519:
---

Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/1724#issuecomment-204682353
  
OK, no problem, I've updated the PR.


> Subclasses of Tuples don't work if the declared type of a DataSet is not the 
> descendant
> ---
>
> Key: FLINK-3519
> URL: https://issues.apache.org/jira/browse/FLINK-3519
> Project: Flink
>  Issue Type: Bug
>  Components: Type Serialization System
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Gevay
>Priority: Minor
>
> If I have a subclass of TupleN, then objects of this type will turn into 
> TupleNs when I try to use them in a DataSet.
> For example, if I have a class like this:
> {code}
> public static class Foo extends Tuple1 {
>   public short a;
>   public Foo() {}
>   public Foo(int f0, int a) {
>   this.f0 = f0;
>   this.a = (short)a;
>   }
>   @Override
>   public String toString() {
>   return "(" + f0 + ", " + a + ")";
>   }
> }
> {code}
> And then I do this:
> {code}
> env.fromElements(0,0,0).map(new MapFunction>() {
>   @Override
>   public Tuple1 map(Integer value) throws Exception {
>   return new Foo(5, 6);
>   }
> }).print();
> {code}
> Then I don't have Foos in the output, but only Tuples:
> {code}
> (5)
> (5)
> (5)
> {code}
> The problem is caused by the TupleSerializer not caring about subclasses at 
> all. I guess the reason for this is performance: we don't want to deal with 
> writing and reading subclass tags when we have Tuples.
> I see three options for solving this:
> 1. Add subclass tags to the TupleSerializer: This is not really an option, 
> because we don't want to loose performance.
> 2. Document this behavior in the javadoc of the Tuple classes.
> 3. Make the Tuple types final: this would be the clean solution, but it is 
> API breaking, and the first victim would be Gelly: the Vertex and Edge types 
> extend from tuples. (Note that the issue doesn't appear there, because the 
> DataSets there always have the type of the descendant class.)
> When deciding between 2. and 3., an important point to note is that if you 
> have your class extend from a Tuple type instead of just adding the f0, f1, 
> ... fields manually in the hopes of getting the performance boost associated 
> with Tuples, then you are out of luck: the PojoSerializer will kick in anyway 
> when the declared types of your DataSets are the descendant type.
> If someone knows about a good reason to extend from a Tuple class, then 
> please comment.
> For 2., this is a suggested wording for the javadoc of the Tuple classes:
> Warning: Please don't subclass Tuple classes, but if you do, then be sure to 
> always declare the element type of your DataSets to your descendant type. 
> (That is, if you have a "class A extends Tuple2", then don't use instances of 
> A in a DataSet, but use DataSet.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-3665] Implemented sort orders support i...

2016-04-02 Thread dawidwys
GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/1848

[FLINK-3665] Implemented sort orders support in range partitioning



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink withOrders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1848


commit bbb46d6fc7c555ba458cb129805ca323ccc8e2d2
Author: dawid 
Date:   2016-04-02T11:10:59Z

[FLINK-3665] Implemented sort orders support in range partitioning

commit bdbaf8ab9dda530128fd92193c08ddc31b91b4a8
Author: dawid 
Date:   2016-04-02T11:15:29Z

Removed unnecessary code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3665) Range partitioning lacks support to define sort orders

2016-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222856#comment-15222856
 ] 

ASF GitHub Bot commented on FLINK-3665:
---

GitHub user dawidwys opened a pull request:

https://github.com/apache/flink/pull/1848

[FLINK-3665] Implemented sort orders support in range partitioning



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dawidwys/flink withOrders

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1848


commit bbb46d6fc7c555ba458cb129805ca323ccc8e2d2
Author: dawid 
Date:   2016-04-02T11:10:59Z

[FLINK-3665] Implemented sort orders support in range partitioning

commit bdbaf8ab9dda530128fd92193c08ddc31b91b4a8
Author: dawid 
Date:   2016-04-02T11:15:29Z

Removed unnecessary code




> Range partitioning lacks support to define sort orders
> --
>
> Key: FLINK-3665
> URL: https://issues.apache.org/jira/browse/FLINK-3665
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Fabian Hueske
> Fix For: 1.1.0
>
>
> {{DataSet.partitionByRange()}} does not allow to specify the sort order of 
> fields. This is fine if range partitioning is used to reduce skewed 
> partitioning. 
> However, it is not sufficient if range partitioning is used to sort a data 
> set in parallel. 
> Since {{DataSet.partitionByRange()}} is {{@Public}} API and cannot be easily 
> changed, I propose to add a method {{withOrders(Order... orders)}} to 
> {{PartitionOperator}}. The method should throw an exception if the 
> partitioning method of {{PartitionOperator}} is not range partitioning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3655) Allow comma-separated or multiple directories to be specified for FileInputFormat

2016-04-02 Thread Tian Li (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222930#comment-15222930
 ] 

Tian Li commented on FLINK-3655:


Hi. I would like to contribute for this issue. Thanks.

> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat
> -
>
> Key: FLINK-3655
> URL: https://issues.apache.org/jira/browse/FLINK-3655
> Project: Flink
>  Issue Type: Improvement
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Gna Phetsarath
>Priority: Minor
>  Labels: starter
>
> Allow comma-separated or multiple directories to be specified for 
> FileInputFormat so that a DataSource will process the directories 
> sequentially.
>
> env.readFile("/data/2016/01/01/*/*,/data/2016/01/02/*/*,/data/2016/01/03/*/*")
> in Scala
>env.readFile(paths: Seq[String])
> or 
>   env.readFile(path: String, otherPaths: String*)
> Wildcard support would be a bonus.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-04-02 Thread smarthi
Github user smarthi closed the pull request at:

https://github.com/apache/flink/pull/1829


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222964#comment-15222964
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-204762444
  
Closing this PR without merging as its "way too specific", thanks for all 
the feedback.  


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-3657: Change access of DataSetUtils.coun...

2016-04-02 Thread smarthi
Github user smarthi commented on the pull request:

https://github.com/apache/flink/pull/1829#issuecomment-204762444
  
Closing this PR without merging as its "way too specific", thanks for all 
the feedback.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222965#comment-15222965
 ] 

ASF GitHub Bot commented on FLINK-3657:
---

Github user smarthi closed the pull request at:

https://github.com/apache/flink/pull/1829


> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-02 Thread Suneel Marthi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suneel Marthi closed FLINK-3657.

Resolution: Won't Fix

> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3657) Change access of DataSetUtils.countElements() to 'public'

2016-04-02 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223077#comment-15223077
 ] 

Fabian Hueske commented on FLINK-3657:
--

As I said before, I don't think this functionality is too specific since the 
Mahout community was asking for it. IMO it could be added to DataSetUtils. I 
was only raising the question whether this should be done as part of a bugfix 
release (1.0.1) or not. From my point of view there were arguments for both 
decisions. 
+ the change is minor, not touching the core API, and requested for a Mahout 
release with Flink support.
- Flink 1.0.1 would not be a "clean" bugfix release and it could also be 
implemented as part of Mahout

> Change access of DataSetUtils.countElements() to 'public' 
> --
>
> Key: FLINK-3657
> URL: https://issues.apache.org/jira/browse/FLINK-3657
> Project: Flink
>  Issue Type: Improvement
>  Components: DataSet API
>Affects Versions: 1.0.0
>Reporter: Suneel Marthi
>Assignee: Suneel Marthi
>Priority: Minor
> Fix For: 1.0.1
>
>
> The access of DatasetUtils.countElements() is presently 'private', change 
> that to be 'public'. We happened to be replicating the functionality in our 
> project and realized the method already existed in Flink.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)