[
https://issues.apache.org/jira/browse/FLINK-3626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15339834#comment-15339834
]
ASF GitHub Bot commented on FLINK-3626:
---------------------------------------
Github user zentol commented on a diff in the pull request:
https://github.com/apache/flink/pull/2136#discussion_r67719598
--- Diff:
flink-libraries/flink-python/src/main/python/org/apache/flink/python/api/flink/plan/DataSet.py
---
@@ -572,6 +572,41 @@ def set_parallelism(self, parallelism):
self._info.parallelism.value = parallelism
return self
+ def count_elements_per_partition(self):
+ class CountElementsPerPartitionMapper(MapPartitionFunction):
+
+ def map_partition(self, iterator, collector):
+ counter = 0
+ for x in iterator:
+ counter += 1
+
+
collector.collect((self.context.get_index_of_this_subtask(), counter))
+ return self.map_partition(CountElementsPerPartitionMapper())
+
+ def zip_with_index(self):
+ element_count = self.count_elements_per_partition()
+ class ZipWithIndexMapper(MapPartitionFunction):
+ start = 0
--- End diff --
if we would start at -1 we wouldn't have to decrement it when collecting
the value
> zipWithIndex in Python API
> --------------------------
>
> Key: FLINK-3626
> URL: https://issues.apache.org/jira/browse/FLINK-3626
> Project: Flink
> Issue Type: New Feature
> Components: Python API
> Affects Versions: 1.0.0
> Environment: OS X 10.11.3, 16GB RAM, 500GB SSD, Core i7 2.5GHz.
> Reporter: Shannon Quinn
> Priority: Minor
>
> Implementation of a `zipWithIndex` method for the Python API on Flink. This
> will affix each record with a sequential integer ID, consistent across the
> distributed data structure.
> Work here: https://github.com/magsol/flink
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)