[jira] [Commented] (PHOENIX-66) Support array creation from CSV file

Gabriel Reid (JIRA) Tue, 11 Mar 2014 02:45:33 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930126#comment-13930126
 ]


Gabriel Reid commented on PHOENIX-66:
-------------------------------------

Thanks for the review [~jamestaylor]. No problem to make the updates you 
suggested, but I'd like to give you the rationale for the choice made in the 
implementation so you can confirm that you want the changes made as suggested:

{quote}
Can you use the standard JDBC APIs for instantiating the array instead of the 
internal PArrayDataType methods? See ArrayTest for examples.
{quote}

The current impl of StringToArrayConverter intentionally doesn't make use of 
Connection#createArrayOf to allow it to run without a Connection object, 
thereby allowing the tests to run quickly (i.e. without spinning up a 
mini-cluster). StringToArrayConverter is also package private for the same 
reason (i.e. that it's exposing the ability to create arrays without a 
connection). The CSV import array is also tested elsewhere where general CSV 
loading is tested, but I wanted to make running the StringToArrayConverter 
tests very quick and painless.

{quote}
Do you have an error check for the array delimiter being the same as the field 
delimiter, as this would causing issues, no?
{quote}

The lack of a check for the array delimiter being the same as the field 
delimiter is somewhat intentional. The rationale is that a user trying to 
import a CSV file and supplying the same delimiter for fields and arrays means 
that they've got a file that uses a single delimiter for both cases, which 
means that there's nothing that can be done to parse the file (and this would 
be an issue even if array delimiters weren't supported). The bigger problem 
that I'm worried about is if the supplied field delimiter is the same as 
default array separator when importing a file that doesn't contain arrays -- 
this would mean that the tool will give an error which is probably pretty 
illogical and confusing for a user, and they would be forced to supply a custom 
array separator to import a file that doesn't contain arrays.

{quote}
Can you use our tab/spacing conventions and compiler settings (see Eclipse 
prefs in phoenix/dev dir)?
{quote}
Really sorry about this one -- the files I was editing were using 2 spaces for 
indentation, and I didn't look further to see if this was the standard over the 
whole project. I'm using Intellij, but I've imported the eclipse prefs.

You mentioned compiler settings as well, but I'm not sure what that would be 
(and I don't think Intellij would import those from eclipse anyhow). Could you 
give any more info on this, or is it documented anywhere?

As I mentioned, for the first two points (and of course all the others) I'm 
happy to update the patch, but could you confirm that you still want those two 
specific changes having seen the rationale behind the current implementation?

> Support array creation from CSV file
> ------------------------------------
>
>                 Key: PHOENIX-66
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-66
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>             Fix For: 3.0.0
>
>         Attachments: PHOENIX-66-intermediate.patch, PHOENIX-66.patch
>
>
> We should support being able to parse an array defined in our CVS file. 
> Perhaps something like this:
> a, b, c, [foo, 1, bar], d
> We'd know (from the data type of the column), that we have an array for the 
> fourth field here.
> One option to support this would be to implement the 
> PDataType.toObject(String) for the ARRAY PDataType enums. That's not ideal, 
> though, as we'd introduce a dependency from PDataType to our CSVLoader, since 
> we'd need to in turn parse each element. Also, we don't have a way to pass 
> through the custom delimiters that might be in use.
> Another pretty trivial, though a bit more constrained approach would be to 
> look at the column ARRAY_SIZE to control how many of the next CSV columns 
> should be used as array elements. In this approach, you wouldn't use the 
> square brackets at all. You can get the ARRAY_SIZE from the column metadata 
> through connection.getMetaData().getColumns() call, through 
> resultSet.getInt("ARRAY_SIZE"); However, the ARRAY_SIZE is optional in a DDL 
> statement, so we'd need to do something for the case where it's not specified.
> A third option would be to handle most of the parsing in the CSVLoader. We 
> could use the above bracket syntax, and then collect up the next set of CSV 
> field elements until we hit the unescaped ']'. Then we'd use our standard 
> JDBC APIs to build the array and continue on our merry way.
> What do you think, [~jviolettedsiq]? Or [~bruno], maybe you can take a crack 
> at it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (PHOENIX-66) Support array creation from CSV file

Reply via email to