[
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13929826#comment-13929826
]
James Taylor commented on PHOENIX-66:
-------------------------------------
Patch looks good, [~gabriel.reid]. Thanks so much. A couple of minor items:
- Can you use the standard JDBC APIs for instantiating the array instead of the
internal PArrayDataType methods? See ArrayTest for examples.
- Do you have an error check for the array delimiter being the same as the
field delimiter, as this would causing issues, no?
- Can you use our tab/spacing conventions and compiler settings (see Eclipse
prefs in phoenix/dev dir)?
- We've gotten away from normalizing column names automatically, as it causes
problems if folks use case sensitive names. Would you mind updating that?
- Do all tests pass?
- Can the patch be applied to 3.0, 4.0, and master (as we have three branches
now)? If not, would you mind attaching separate patches for the different
branches?
[~jeffreyz] - are you ok with this change? [~mujtaba] and
[[email protected]] - any feedback on this?
> Support array creation from CSV file
> ------------------------------------
>
> Key: PHOENIX-66
> URL: https://issues.apache.org/jira/browse/PHOENIX-66
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Fix For: 3.0.0
>
> Attachments: PHOENIX-66-intermediate.patch, PHOENIX-66.patch
>
>
> We should support being able to parse an array defined in our CVS file.
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the
> fourth field here.
> One option to support this would be to implement the
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal,
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since
> we'd need to in turn parse each element. Also, we don't have a way to pass
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to
> look at the column ARRAY_SIZE to control how many of the next CSV columns
> should be used as array elements. In this approach, you wouldn't use the
> square brackets at all. You can get the ARRAY_SIZE from the column metadata
> through connection.getMetaData().getColumns() call, through
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We
> could use the above bracket syntax, and then collect up the next set of CSV
> field elements until we hit the unescaped ']'. Then we'd use our standard
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack
> at it?
--
This message was sent by Atlassian JIRA
(v6.2#6252)