You shouldn't get that error even if you're using the localfs. I will double check that.
On Sat, Mar 5, 2016 at 2:41 AM, Xikui Wang <[email protected]> wrote: > Hi, > > @Young-Seok, Thanks for noticing. This is quite convenient for loading > small batch files. > > @Yingyi, Thanks for pointing out the limitations. I tried with my datasets > (700 x 50MB per file), > and it drained all system resources as you expected. Actually the mechanism > that you mentioned > HDFS like localfs is what I am looking for. That would be useful for > standalone users. Or maybe we just > don't care standalone users since they are too small. :) > > @abdullah, I tried directory path, but it doesn't go through. It raises ' > xxx is a directory error'. I guess it's > because I am using localfs? > > Best, > Xikui > > On Fri, Mar 4, 2016 at 2:28 PM, abdullah alamoudi <[email protected]> > wrote: > > > You can however specify the directory in the path parameter and not the > > individual files and they will be processed sequentially (or 1 thread per > > specified path). > > > > On Sat, Mar 5, 2016 at 1:04 AM, Young-Seok Kim <[email protected]> > wrote: > > > > > That makes sense. > > > > > > Cheers, > > > Young-Seok > > > > > > On Fri, Mar 4, 2016 at 1:48 PM, Yingyi Bu <[email protected]> wrote: > > > > > > > Young-Seok, > > > > > > > > That works when the number of local files is relatively small. > > > > However, when the number of localfs files is 1000, the 1000 files > will > > > be > > > > loaded in parallel simultaneously, which will exhaust all system > > > resources. > > > > Loading from HDFS doesn't have the problem because the 1000 (or more) > > > file > > > > splits will be queued into each parallel loader. > > > > > > > > Best, > > > > Yingyi > > > > > > > > > > > > On Fri, Mar 4, 2016 at 1:42 PM, Young-Seok Kim <[email protected]> > > > wrote: > > > > > > > > > You can also load multiple adm files into a same dataset with a > > single > > > > AQL > > > > > as follows: > > > > > > > > > > load dataset Tweets > > > > > > > > > > using > > "org.apache.asterix.external.dataset.adapter.NCFileSystemAdapter" > > > > > > > > > > (("path"= > > > > > > > > > > "130.149.249.60 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi27-pid0.adm, > > > > > > > > > > 130.149.249.53 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi26-pid1.adm, > > > > > > > > > > 130.149.249.54 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi25-pid2.adm, > > > > > > > > > > 130.149.249.55 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi24-pid3.adm, > > > > > > > > > > 130.149.249.56 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi23-pid4.adm, > > > > > > > > > > 130.149.249.57 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi22-pid5.adm, > > > > > > > > > > 130.149.249.58 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi21-pid6.adm, > > > > > > > > > > 130.149.249.59 > > > > > > > > > > > > > > > > > > > > :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi20-pid7.adm"), > > > > > > > > > > ("format"="adm")); > > > > > > > > > > > > > > > The above AQL loads 8 adm files into a single dataset named Tweets. > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Young-Seok > > > > > > > > > > On Fri, Mar 4, 2016 at 12:19 PM, Xikui Wang <[email protected]> > wrote: > > > > > > > > > > > Hi Yingyi, > > > > > > > > > > > > Thanks for your reply. I think the external dataset with scan > query > > > is > > > > a > > > > > > good solution. > > > > > > I will try that. Thank you. > > > > > > > > > > > > Best, > > > > > > Xikui > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:53 AM, Yingyi Bu <[email protected]> > > > wrote: > > > > > > > > > > > > > Xikui, > > > > > > > > > > > > > > If the number of localfs files is too large, a solution could > be > > > to > > > > > put > > > > > > > your files on HDFS and then load it. Loading from HDFS always > > has > > > a > > > > > > fixed > > > > > > > degree of parallelism regardless of the number of files. > > > > > > > > > > > > > > >> I am wondering is there a way to append adm file to existed > > > > dataset? > > > > > > > You can create an external dataset and then write an insert > > > statement > > > > > > where > > > > > > > the body is a scan query. AsterixDB doesn't load any data into > > its > > > > own > > > > > > > storage for an external dataset but just keeps file paths. > > > > > > > Here is a manual for external datasets: > > > > > > > https://ci.apache.org/projects/asterixdb/aql/externaldata.html > > > > > > > > > > > > > > Best, > > > > > > > Yingyi > > > > > > > > > > > > > > > > > > > > > On Fri, Mar 4, 2016 at 11:47 AM, Xikui Wang <[email protected]> > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I want to import data from multiple adm files into a same > > > dataset. > > > > > > > Merging > > > > > > > > them together and then loading from localfs can be a viable > > > > solution, > > > > > > but > > > > > > > > this may become a problem when the number become too large. I > > am > > > > > > > wondering > > > > > > > > is there a way to append adm file to existed dataset? > > > > > > > > > > > > > > > > Thank you. > > > > > > > > > > > > > > > > Best, > > > > > > > > Xikui > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
