fmcquillan99 commented on a change in pull request #445: updated DL
preprocessor docs for bytea
URL: https://github.com/apache/madlib/pull/445#discussion_r330211862
##########
File path:
src/ports/postgres/modules/deep_learning/input_data_preprocessor.sql_in
##########
@@ -209,25 +214,39 @@ validation_preprocessor_dl(source_table,
validation_preprocessor_dl() contain the following columns:
<table class="output">
<tr>
- <th>buffer_id</th>
- <td>INTEGER. Unique id for each row in the packed table.
+ <th>independent_var</th>
+ <td>BYTEA. Packed array of independent variables in PostgreSQL bytea
format.
</td>
</tr>
<tr>
<th>dependent_var</th>
- <td>ANYARRAY[]. Packed array of dependent variables.
+ <td>BYTEA. Packed array of dependent variables in PostgreSQL bytea
format.
The dependent variable is always one-hot encoded as an
- INTEGER[] array. For now, we are assuming that
+ integer array. For now, we are assuming that
input_preprocessor_dl() will be used
only for classification problems using deep learning. So
the dependent variable is one-hot encoded, unless it's already a
numeric array in which case we assume it's already one-hot
- encoded and just cast it to an INTEGER[] array.
+ encoded and just cast it to an integer array.
</td>
</tr>
<tr>
- <th>independent_var</th>
- <td>REAL[]. Packed array of independent variables.
+ <th>independent_var_shape</th>
+ <td>INTEGER[]. Shape of the independent variable array after
preprocessing.
+ The first element is the number of images packed per row, and
subsequent
+ elements will depend on how the image is described (e.g., channels
first or last).
+ </td>
+ </tr>
+ <tr>
+ <th>dependent_var_shape</th>
+ <td>INTEGER[]. Shape of the dependent variable array after
preprocessing.
+ The first element is the number of images packed per row, and the
second
+ element is the number of class values.
Review comment:
We do talk about 1-hot encoding in some detail when talking about
`dependent_varname` in the function definition, so I am hoping that is
sufficient. If not, please let me know.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services