TheNeuralBit commented on issue #23467:
URL: https://github.com/apache/beam/issues/23467#issuecomment-1269044871

   Thanks Cham, I'm surprised we didn't already have this tracked somewhere :)
   
   We also need this for `df.predict` (or `predict(df, ..)`).
   
   There are a couple of challenges here that have been rattling around in my 
head:
   - For many ModelHandlers the input/output type is basically bag of numbers, 
e.g. a Tensor with dimensions (X,Y). It's ambiguous how these should be mapped 
to Beam schemas.
     - It could be a schema with a single field of type `List[List[int64]]`
     - Or perhaps one dimension correspond to the schema fields (e.g. X fields 
of type `List[int64]`)
     - This is particularly problematic for the `df.predict` case, since the 
pandas type system doesn't support complex types.
   - On the output side, we likely can't get a detailed, parameterized type to 
map back to Beam schemas. That is, we may know that the model produces a 
Tensor, but we don't know the dimensions, which one is the batch dimension, 
etc...:
     - In some cases we may be able to use the `proxy` trick from the DataFrame 
API: pass through an instance with 0-length batch dimension and see what we get 
out. But I don't know if this will work universally.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to