[ 
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated PHOENIX-66:
--------------------------------

    Attachment: PHOENIX-66-intermediate.patch

I started working on this one, but I've ended up doing a fair bit of 
refactoring for the CSV loading in general, specifically making a dedicated 
(separate) implementation for going from CSV records to upserts that is used by 
both the normal CSV loader and the MapReduce-based loader, as well as some 
general cleanup of the involved classes.

I was planning on building further on this to implement the array import, but 
wanted to check that the direction I'm going with this is ok with everyone. The 
intermediate refactoring patch is attached for reference, although it's not 
ready for commit yet (it's three commits which can be later squashed together 
if needed). Any objections to the direction I'm going with this?

> Support array creation from CSV file
> ------------------------------------
>
>                 Key: PHOENIX-66
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-66
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>             Fix For: 3.0.0
>
>         Attachments: PHOENIX-66-intermediate.patch
>
>
> We should support being able to parse an array defined in our CVS file. 
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the 
> fourth field here.
> One option to support this would be to implement the 
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, 
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since 
> we'd need to in turn parse each element. Also, we don't have a way to pass 
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to 
> look at the column ARRAY_SIZE to control how many of the next CSV columns 
> should be used as array elements. In this approach, you wouldn't use the 
> square brackets at all. You can get the ARRAY_SIZE from the column metadata 
> through connection.getMetaData().getColumns() call, through 
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL 
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We 
> could use the above bracket syntax, and then collect up the next set of CSV 
> field elements until we hit the unescaped ']'. Then we'd use our standard 
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack 
> at it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to