[ https://issues.apache.org/jira/browse/SPARK-41945 ]
jiaan.geng deleted comment on SPARK-41945: ------------------------------------ was (Author: beliefer): I'm working on. > Python: connect client lost column data with pyarrow.Table.to_pylist > -------------------------------------------------------------------- > > Key: SPARK-41945 > URL: https://issues.apache.org/jira/browse/SPARK-41945 > Project: Spark > Issue Type: Sub-task > Components: Connect > Affects Versions: 3.4.0 > Reporter: jiaan.geng > Priority: Major > > Python: connect client should not use pyarrow.Table.to_pylist to transform > fetched data. > For example: > the data in pyarrow.Table show below. > {code:java} > pyarrow.Table > key: string > order: int64 > nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string > nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string > nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC > NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): string > ---- > key: [["a","a","a","a","a","b","b"]] > order: [[0,1,2,3,4,1,2]] > nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): > [[null,"x","x","x","x",null,null]] > nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS FIRST > RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): > [[null,"x","x","x","x",null,null]] > nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order ASC > NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW): > [[null,null,"y","y","y",null,null]] > {code} > The table have five columns show above. > But the data after call pyarrow.Table.to_pylist() show below. > {code:java} > [{ > 'key': 'a', > 'order': 0, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None, > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None > }, { > 'key': 'a', > 'order': 1, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None > }, { > 'key': 'a', > 'order': 2, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y' > }, { > 'key': 'a', > 'order': 3, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y' > }, { > 'key': 'a', > 'order': 4, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'x', > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': 'y' > }, { > 'key': 'b', > 'order': 1, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None, > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None > }, { > 'key': 'b', > 'order': 2, > 'nth_value(value, 2) OVER (PARTITION BY key ORDER BY order ASC NULLS > FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None, > 'nth_value(value, 2) ignore nulls OVER (PARTITION BY key ORDER BY order > ASC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)': None > }] > {code} > There are only four columns left. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org