[ https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665260#comment-15665260 ]
ASF GitHub Bot commented on FLINK-4964: --------------------------------------- Github user tfournier314 commented on the issue: https://github.com/apache/flink/pull/2740 @greghogan I've not pushed the code yet because my tests are still incorrect. Indeed the following code: val env = ExecutionEnvironment.getExecutionEnvironment val fitData = env.fromCollection(List("a","b","c","a","a","d","a","a","a","b","b","c","a","c","b","c","b")) fitData.map(s => (s,1)).groupBy(0) .reduce((a,b) => (a._1, a._2 + b._2)) .partitionByRange(1) .sortPartition(1, Order.DESCENDING) .zipWithIndex .print() returns (0,(b,5)) (1,(c,4)) (2,(d,1)) (3,(a,7)) And I would like the following: (1,(b,5)) (2,(c,4)) (3,(d,1)) (0,(a,7)) Even if the order inside partitions is preserved (with mapPartitions), the order between partitions is not right ? > FlinkML - Add StringIndexer > --------------------------- > > Key: FLINK-4964 > URL: https://issues.apache.org/jira/browse/FLINK-4964 > Project: Flink > Issue Type: New Feature > Reporter: Thomas FOURNIER > Priority: Minor > > Add StringIndexer as described here: > http://spark.apache.org/docs/latest/ml-features.html#stringindexer > This will be added in package preprocessing of FlinkML -- This message was sent by Atlassian JIRA (v6.3.4#6332)