Great point. Having the file name itself is very handy.
For one thing, I can make a really slow version of [find] ! (seriously, I would love this) On Thu, Apr 23, 2015 at 7:48 PM, rahul challapalli < challapallira...@gmail.com> wrote: > I am also under the opinion that we should not assume knowledge on the user > front for data discovery. So we should either have 'dir' columns in 'select > *' or support a variation that Ted suggested. > Also the folder names compliment the actual data in some cases. > > - Rahul > > On Thu, Apr 23, 2015 at 4:38 PM, Daniel Barclay <dbarc...@maprtech.com> > wrote: > > > Regarding the use case in which the user stores information in pathnames: > > > > Since Drill supports that use case partially, shouldn't it do so more > > completely? In particular, since Drill provides access to subtree > > pathname segments before the last one (the segments for directories), > > should Drill provide access to the last one too (the simple file name)? > > > > > > We support reading cases like this: > > - root/ > > - root/2015/ > > - root/2015/01/ > > - root/2015/01/01/ > > - root/2015/01/01/log.json > > - root/2015/02/ > > - root/2015/02/02/ > > - root/2015/02/02/log.json > > > > In particular, querying "select ... from `root` ..." includes the > > date-portion segments of the pathnames in the dir0, etc, columns. > > > > Note that the user might not redundantly store the dates inside the > > files themselves, since the dates are known to exist in the directory > > names. > > > > > > However, we don't support this variation of that case, right?: > > > > - root/ > > - root/2015 > > - root/2015/01/ > > - root/2015/01/log_01.json > > - root/2015/02/ > > - root/2015/02/log_02.json > > > > In particular, Drill includes several segments of the pathname after > > the root of the subtree, but does not include the last segment--which > > contains data just as the segments that _are_ included do. > > > > (Yes, the last segment usually contains artifacts besides the contained > > data (e.g., the file extension) and the user would have to specify how > > to interpret the file simple name segment as data, but the user has to > > specify the interpretation for the other segments anyway.) > > > > > > Daniel > > > > > > > > Ted Dunning wrote: > > > >> I would propose that dir be an array that contains all of the > directories > >> rather than having multiple values. > >> > >> The multiple names are particularly inconvenient if files are are > >> different > >> depths. > >> > >> > >> > >> On Thu, Apr 23, 2015 at 5:56 PM, Jacques Nadeau <jacq...@apache.org> > >> wrote: > >> > >> I'm specifically arguing that SELECT * doesn't return the columns. > >>> > >>> Here is current behavior: > >>> > >>> /mytdir/mysdir/myfile.json > >>> {a:1,b:2,c:3} > >>> {a:4,b:5,c:6} > >>> > >>> select * from `myfile.json` > >>> > >>> a, b, c > >>> 1, 2, 3 > >>> 4, 5, 6 > >>> > >>> select * from `/mysdir/myfile.json` > >>> > >>> dir0 a, b, c > >>> mysdir, 1, 2, 3 > >>> mysdir, 4, 5, 6 > >>> > >>> select * from `/mytdir/mysdir/myfile.json` > >>> > >>> dir0, dir1 a, b, c > >>> mytdir, mysdir, 1, 2, 3 > >>> mytdir, mysdir, 4, 5, 6 > >>> > >>> > >>> ==================================== > >>> My proposal: > >>> > >>> select * from `myfile.json` > >>> select * from `/mysdir/myfile.json` > >>> select * from `/mytdir/mysdir/myfile.json` > >>> ::all produce:: > >>> a, b, c > >>> 1, 2, 3 > >>> 4, 5, 6 > >>> > >>> select dir0, a, b, c from `/mysdir/myfile.json` > >>> > >>> dir0 a, b, c > >>> mysdir, 1, 2, 3 > >>> mysdir, 4, 5, 6 > >>> > >>> select dir0, a, b, c from `/mytdir/mysdir/myfile.json` > >>> > >>> dir0 a, b, c > >>> mytdir, 1, 2, 3 > >>> mytdir, 4, 5, 6 > >>> > >>> > >>> > >>> > >>> On Thu, Apr 23, 2015 at 5:42 PM, Aman Sinha <asi...@maprtech.com> > wrote: > >>> > >>> Seems reasonable, as long as SELECT * also returns the dir# columns. > >>>> > >>>> On Thu, Apr 23, 2015 at 2:34 PM, Jacques Nadeau <jacq...@apache.org> > >>>> wrote: > >>>> > >>>> Hey guys, > >>>>> > >>>>> I've been thinking that always showing dir# columns seems to alter > data > >>>>> returned from Drill depending on how you select the directory. I'd > >>>>> > >>>> propose > >>>> > >>>>> that we make it so that we only return dir# columns when they are > >>>>> explicitly requested. > >>>>> > >>>>> Thoughts? > >>>>> > >>>>> > >>>> > >>> > >> > > > > -- > > Daniel Barclay > > MapR Technologies > > >