:-) that did the trick it seems!
2013/8/23 Balendra Singh <[email protected]>: > +1 > > > Thanks, > Balendra > > > On 23 August 2013 13:13, Hans Drexler <[email protected]>wrote: > >> +1 >> >> Hans >> >> -----Original Message----- >> From: Kasper Sørensen [mailto:[email protected]] >> Sent: Tuesday, August 20, 2013 4:27 PM >> To: [email protected] >> Subject: Re: [DISCUSS] use folder name as schema name for file based >> DataContexts >> >> I've updated my gist/patch [1] with also support for using quotes in the >> table/column paths. Let's have a vote on this patch, to see if we can get >> this in. >> >> [1] https://gist.github.com/kaspersorensen/6210970 >> >> 2013/8/20 Kasper Sørensen <[email protected]>: >> > Agreed on all. Except why should dots in column names be any different >> > than schema and table names? >> > >> > 2013/8/16 Hans Drexler <[email protected]>: >> >> I believe that probably, *every* convention will have its drawbacks. >> using a factory can help on one hand, but it can also cause great confusion >> if things get mixed. It also makes things more complex. If we clearly >> document the choice made, I will live with that. >> >> >> >> My main point is that we should try to write and document the software >> in such way that MetaModel users will not get confused. I like the quotes >> idea, since that will allow the user to explicitely express what is >> intended. But then, lets extend it to something like this: >> >> >> >> "schema_name"."table_name"."column_name" >> >> >> >> Where schema_name and table_name can contain dots ("."). (I guess >> column names cannot...) >> >> >> >> I hope you don't mind me rambling about this... >> >> >> >> kind regards, >> >> >> >> Hans >> >> >> >> -----Original Message----- >> >> From: Kasper Sørensen [mailto:[email protected]] >> >> Sent: Wednesday, August 14, 2013 2:59 PM >> >> To: [email protected] >> >> Subject: Re: [DISCUSS] use folder name as schema name for file based >> >> DataContexts >> >> >> >> With those different preferences, we could even consider making >> something like a "TableNameFactory" which converts filenames into table >> names. But I guess the crucial point is which default convention to use. >> >> >> >> Underscoring makes it a bit cleaner to look at the column or table >> paths, but it also makes the representation less direct. A user could start >> wondering if there are other characters than dots that will be replaced by >> underscores etc. >> >> >> >> It should be noted that MM's parser does support dots in both table and >> schema names, so this is probably mostly a question of aesthetics. >> >> >> >> The ambiguity that you point out is also interesting. So far I haven't >> seen it appear in real life, but technically it could occur that you had >> two pairs of schemas and tables that would generate a ambigious table path. >> For instance: >> >> >> >> Schema: foo.bar >> >> Table: baz >> >> >> >> and >> >> >> >> Schema: foo >> >> Table: bar.baz >> >> >> >> The parser would currently favor the second schema ("foo") since it >> >> incrementally tries for schema/table/column matches with every >> >> dot-separated token. An improvement to the parser would be to allow >> >> quote characters, so that you could express your table path like this >> >> then: >> >> >> >> "foo.bar".baz >> >> >> >> Also I want to note that some databases do support dots in >> schema/table/column names, so this ambiguity can (although rarely) also >> occur in a RDBMS or other data sources. It would also be quite common with >> some separator (not necesarily a dot) in NoSQL database column names, to >> indicate a nested field. In HBase for instance they are referred using >> colon, like this: "columnFamily:column". >> >> >> >> All in all I am mostly feeling like preserving the dots from the >> filenames, but am also very curious what other people think! >> >> >> >> 2013/8/14 Hans Drexler <[email protected]>: >> >>> Hi, >> >>> >> >>> First I agree with bumping this issue. When at the customer, this >> thing caused a lot of time spent in figuring out what was going on. I am >> not sure if I like the extension as part of the table name, because: >> >>> - I would never create a table in a relational database with a dot >> >>> in the name >> >>> - It creates a ambiguity. If you have a "full" path name to a column, >> like " documents.people.csv.name ", then it is not clear if the schema >> name is "documents.people" and the table name is "csv", or that the schema >> name is "documents" and the table name is "people.csv". It seems natural to >> me that schema names contain dots, but not table names. >> >>> >> >>> Alternatives: >> >>> - Leave the extension out of the name (probably not acceptable, >> because then you can no longer have two "tables" differing only in >> extension). Although I must say that personally I think this would be the >> best solution. >> >>> >> >>> - Use a conventional name, like: >> >>> Schema name: Folder name >> >>> Table name: The filename, including extension (all dots replaced by >> underscores). >> >>> Resulting in e.g. a column path like this: >> >>> documents.people_csv.name >> >>> >> >>> At the customer site, the file I needed to use was actually called >> like this pattern: "bar/FOO.PEOPLE.IN.FILE". Using the convention, this >> would become: >> >>> bar.FOO_PEOPLE_IN_FILE >> >>> >> >>> IMHO this is preferable to "bar.foo.people.in.file" >> >>> >> >>> The problem is of course that it would now be impossible to have >> >>> another file "bar/FOO_PEOPLE_IN_FILE" :-( >> >>> >> >>> I am happy to hear other peoples thougths. >> >>> >> >>> >> >>> Hans >> >>> >> >>> >> >>> -----Original Message----- >> >>> From: Kasper Sørensen [mailto:[email protected]] >> >>> Sent: Wednesday, August 14, 2013 10:18 AM >> >>> To: [email protected] >> >>> Subject: Re: [DISCUSS] use folder name as schema name for file based >> >>> DataContexts >> >>> >> >>> Rats, made a mistake in that diff. The Gist has been updated [1] and >> now contains the ResourceUtils class which was missing before. >> >>> [1] https://gist.github.com/kaspersorensen/6210970 >> >>> >> >>> 2013/8/12 Kasper Sørensen <[email protected]>: >> >>>> Here's a proposed patch (implemented for CSV and fixedwidth files >> >>>> which are the modules that implemented the old schema naming pattern): >> >>>> https://gist.github.com/kaspersorensen/6210970 >> >>>> >> >>>> 2013/8/10 Kasper Sørensen <[email protected]>: >> >>>>> https://issues.apache.org/jira/browse/METAMODEL-4 >> >>>>> >> >>>>> 2013/8/10 Henry Saputra <[email protected]>: >> >>>>>> What is the JIRA for this one? >> >>>>>> >> >>>>>> >> >>>>>> On Fri, Aug 9, 2013 at 2:26 AM, Manuel van den Berg < >> >>>>>> [email protected]> wrote: >> >>>>>> >> >>>>>>> +1 >> >>>>>>> >> >>>>>>> (shouldn't I just vote on the Jira for this?) >> >>>>>>> >> >>>>>>> manuel >> >>>>>>> >> >>>>>>> > -----Original Message----- >> >>>>>>> > From: Kasper Sørensen [mailto:[email protected]] >> >>>>>>> > Sent: Friday, August 09, 2013 9:03 >> >>>>>>> > To: [email protected] >> >>>>>>> > Subject: Re: [DISCUSS] use folder name as schema name for file >> >>>>>>> > based DataContexts >> >>>>>>> > >> >>>>>>> > Allow me to bump this issue (it's my impression that more >> >>>>>>> > people have >> >>>>>>> joined >> >>>>>>> > in a bit late, after this topic was posted). >> >>>>>>> > >> >>>>>>> > I think this is one of the more important issues that I would >> >>>>>>> > want to fix before we make our first release at Apache. >> >>>>>>> > >> >>>>>>> > 2013/7/24 Kasper Sørensen <[email protected]>: >> >>>>>>> > > Right now we have this slightly odd naming convention for >> >>>>>>> > > schema and table names when building metadata for e.g. a CSV >> >>>>>>> > > file or a fixed width value file. >> >>>>>>> > > >> >>>>>>> > > Schema name: The filename, including file extension. >> >>>>>>> > > Table name: The filename without extension. >> >>>>>>> > > Resulting in e.g. a column path like this: >> >>>>>>> > > people.csv.people.name >> >>>>>>> > > >> >>>>>>> > > I suggest we change it to this convention: >> >>>>>>> > > >> >>>>>>> > > Schema name: Folder name >> >>>>>>> > > Table name: The filename, including file extension. >> >>>>>>> > > Resulting in e.g. a column path like this: >> >>>>>>> > > documents.people.csv.name >> >>>>>>> > > >> >>>>>>> > > Why do I think this would be an improvement? >> >>>>>>> > > >> >>>>>>> > > 1) Because this would first of all make a kind of sense to >> >>>>>>> > > the user to see the file system's hierarchy reflected in the >> schema model. >> >>>>>>> > > 2) Because it allows us to make these DataContext's operate >> >>>>>>> > > not on a single file, but on a directory of files. I have >> >>>>>>> > > seen this quite a number of times by now that users of >> MetaModel, or users of e.g. >> >>>>>>> > > DataCleaner, which uses MetaModel quite heavily, wants to do >> >>>>>>> > > this sort >> >>>>>>> of >> >>>>>>> > stuff. >> >>>>>>> > > 3) The removing of the file extension stuff is kind of >> >>>>>>> > > broken and a strange convention in the first place. >> >>>>>>> > > >> >>>>>>> > > While this doesn't really break backwards compatibility in >> >>>>>>> > > terms of Java code, it would break configuration files and >> >>>>>>> > > other stuff of applications that use MetaModel. But I do >> >>>>>>> > > believe that can be communicated and handled through >> >>>>>>> > > carefully explaining the new convention on the migration page >> (that I recently started writing [1]). >> >>>>>>> > > >> >>>>>>> > > What do you think? >> >>>>>>> > > >> >>>>>>> > > [1] >> >>>>>>> > > http://wiki.apache.org/metamodel/MigratingFromEobjectsMetaMo >> >>>>>>> > > de >> >>>>>>> > > l >> >>>>>>> >>
