TheNeuralBit commented on a change in pull request #13126:
URL: https://github.com/apache/beam/pull/13126#discussion_r506765257
##########
File path: sdks/python/apache_beam/dataframe/schemas.py
##########
@@ -146,15 +146,21 @@ class BatchRowsAsDataFrame(beam.PTransform):
Batching parameters are inherited from
:class:`~apache_beam.transforms.util.BatchElements`.
"""
- def __init__(self, *args, **kwargs):
+ def __init__(self, *args, proxy, **kwargs):
Review comment:
I think this should just generate the proxy from `pcoll.element_type` in
expand. I don't know if there's a good way to retrieve it in `to_pcollection`
though... it could just be re-generated, or memoized.
This is one reason I was thinking about a pandas typehint. If we had that
this transform could annotate the output PC with a typehint containing the
proxy and to_pcollection would be able to retrieve it.
##########
File path: sdks/python/apache_beam/dataframe/convert.py
##########
@@ -112,23 +118,7 @@ def to_pcollection(
if label is None:
# Attempt to come up with a reasonable, stable label by retrieving the name
# of these variables in the calling context.
- current_frame = inspect.currentframe()
- if current_frame is None:
- label = 'ToDataframe(...)'
-
- else:
- previous_frame = current_frame.f_back
-
- def name(obj):
- for key, value in previous_frame.f_locals.items():
- if obj is value:
- return key
- for key, value in previous_frame.f_globals.items():
- if obj is value:
- return key
- return '...'
-
- label = 'ToDataframe(%s)' % ', '.join(name(e) for e in dataframes)
Review comment:
whoops, glad you noticed the typo I missed the first time around.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]