Young-Seok,

That works when the number of local files is relatively small.
However, when the number of localfs files is 1000,  the 1000 files will be
loaded in parallel simultaneously, which will exhaust all system resources.
Loading from HDFS doesn't have the problem because the 1000 (or more) file
splits will be queued into each parallel loader.

Best,
Yingyi


On Fri, Mar 4, 2016 at 1:42 PM, Young-Seok Kim <[email protected]> wrote:

> You can also load multiple adm files into a same dataset with a single AQL
> as follows:
>
> load dataset Tweets
>
> using "org.apache.asterix.external.dataset.adapter.NCFileSystemAdapter"
>
> (("path"=
>
> "130.149.249.60
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi27-pid0.adm,
>
> 130.149.249.53
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi26-pid1.adm,
>
> 130.149.249.54
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi25-pid2.adm,
>
> 130.149.249.55
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi24-pid3.adm,
>
> 130.149.249.56
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi23-pid4.adm,
>
> 130.149.249.57
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi22-pid5.adm,
>
> 130.149.249.58
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi21-pid6.adm,
>
> 130.149.249.59
>
> :///data/seok.kim/spatial-index-experiment/files/SyntheticTweetsRectangleHouse200M-psi20-pid7.adm"),
>
> ("format"="adm"));
>
>
> The above AQL loads 8 adm files into a single dataset named Tweets.
>
>
> Cheers,
>
> Young-Seok
>
> On Fri, Mar 4, 2016 at 12:19 PM, Xikui Wang <[email protected]> wrote:
>
> > Hi Yingyi,
> >
> > Thanks for your reply. I think the external dataset with scan query is a
> > good solution.
> > I will try that. Thank you.
> >
> > Best,
> > Xikui
> >
> > On Fri, Mar 4, 2016 at 11:53 AM, Yingyi Bu <[email protected]> wrote:
> >
> > > Xikui,
> > >
> > > If the number of localfs files is too large,  a solution could be to
> put
> > > your files on HDFS and then load it.  Loading from HDFS always has a
> > fixed
> > > degree of parallelism regardless of the number of files.
> > >
> > > >> I am wondering is there a way to append adm file to existed dataset?
> > > You can create an external dataset and then write an insert statement
> > where
> > > the body is a scan query. AsterixDB doesn't load any data into its own
> > > storage for an external dataset but just keeps file paths.
> > > Here is a manual for external datasets:
> > > https://ci.apache.org/projects/asterixdb/aql/externaldata.html
> > >
> > > Best,
> > > Yingyi
> > >
> > >
> > > On Fri, Mar 4, 2016 at 11:47 AM, Xikui Wang <[email protected]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I want to import data from multiple adm files into a same dataset.
> > > Merging
> > > > them together and then loading from localfs can be a viable solution,
> > but
> > > > this may become a problem when the number become too large. I am
> > > wondering
> > > > is there a way to append adm file to existed dataset?
> > > >
> > > > Thank you.
> > > >
> > > > Best,
> > > > Xikui
> > > >
> > >
> >
>

Reply via email to