kaknikhil commented on a change in pull request #533:
URL: https://github.com/apache/madlib/pull/533#discussion_r564706310



##########
File path: src/ports/postgres/modules/utilities/utilities.py_in
##########
@@ -75,6 +75,35 @@ def get_segments_per_host():
         return max(1, count)
 # 
------------------------------------------------------------------------------
 
+def get_data_distribution_per_segment(table_name):
+    """
+    Returns a list with count of segments on each host that the input
+    table's data is distributed on.
+    :param table_name: input table name
+    :return: list with count of segments on each host that the input
+    table's data is distributed on.
+    len(return list) = total num of segments in cluster
+    """
+    if is_platform_pg():
+        return [1]
+    else:
+        res = plpy.execute("""
+                    WITH CTE AS (SELECT DISTINCT(gp_segment_id)
+                                 FROM {table_name})
+                    SELECT content, count as cnt
+                        FROM gp_segment_configuration
+                        JOIN (SELECT hostname, count(*)
+                              FROM gp_segment_configuration
+                              WHERE content in (SELECT * FROM cte)
+                              GROUP BY hostname) a
+                        USING (hostname)
+                        WHERE content in (SELECT * FROM cte)
+                    ORDER BY 1""".format(table_name=table_name))
+        data_distribution_per_segment = [0] * get_seg_number()
+        for r in res:
+            data_distribution_per_segment[r['content']] = int(r['cnt'])
+        return data_distribution_per_segment

Review comment:
       you are right, this function is a bit specific to deep learning. For 
this PR, we can move the function inside the deep learning module and any other 
refactor can be taken care of in a future PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to