Totally agree. Probably better make sure it works nicely with that many tasks and then fix the number of readers.
Cheers, Abdullah. On Sun, Feb 21, 2016 at 2:04 AM, Mike Carey <[email protected]> wrote: > Sounds like the load job parallelism needs a redo - it probably shouldn't > be more than the number of target partitions IMO...? > On Feb 20, 2016 12:41 PM, "abdullah alamoudi" <[email protected]> wrote: > > > I have an idea that might explain why such a strange behavior happened. I > > believe it could be due to the number of task partitions being very high > > assuming each of the 76 files is being read in a separate task. > > This could potentially lead to some corner cases that we didn't consider > > before considering the number of threads in the tasks thread pool is less > > than 76, some tasks will not be able to start until others have completed > > execution. > > > > Just a thought, > > Abdullah. > > > > On Fri, Feb 19, 2016 at 9:43 PM, abdullah alamoudi <[email protected]> > > wrote: > > > > > Yiran, > > > Here is one problem causing a failure: > > > edu.uci.ics.hyracks.api.exceptions.HyracksDataException: > > > edu.uci.ics.hyracks.api.exceptions.HyracksDataException: > > > > > > edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: > > > Input stream given to BTree bulk load has duplicates. > > > > > > which tells us that Input stream given to BTree bulk load has > duplicates. > > > The question is why this was not returned as the error message? We need > > to > > > look into that. > > > > > > I will continue looking at the log file to see if there were other > > issues. > > > > > > Can you share with us the load statement you're using? I would like to > > see > > > how you're loading all the files. we might be able to suggest a way to > > make > > > it work better. > > > > > > Cheers, > > > Abdullah. > > > > > > On Fri, Feb 19, 2016 at 9:31 PM, Yiran Wang <[email protected]> wrote: > > > > > >> Abdullah, > > >> > > >> Here is the log attached. Thank you all very much for looking into > this. > > >> > > >> Ian - I have two query questions besides this loading issue. I was > > >> wondering if I can meet briefly with you (or over email) regarding > that. > > >> > > >> Thanks! > > >> Yiran > > >> > > >> On Fri, Feb 19, 2016 at 9:38 AM, Mike Carey <[email protected]> > wrote: > > >> > > >>> Maybe Ian can visit the cluster with Yiran later today? > > >>> On Feb 19, 2016 1:31 AM, "abdullah alamoudi" <[email protected]> > > wrote: > > >>> > > >>>> Yiran, > > >>>> Can you share the logs? It would help us identifying the actual > cause > > >>>> of this failure much faster. > > >>>> > > >>>> I am pretty sure you know this but in case you didn't, you can get > the > > >>>> logs using > > >>>> >managix log -n <instance-name> > > >>>> > > >>>> Also, it would be nice if someone from the team has access to the > > >>>> cluster so we can work with it directly. > > >>>> Cheers, > > >>>> Abdullah. > > >>>> > > >>>> > > >>>> On Fri, Feb 19, 2016 at 9:40 AM, Yiran Wang <[email protected]> > > wrote: > > >>>> > > >>>>> Steven, > > >>>>> > > >>>>> Thanks for getting back to me so quickly! I wasn't clear. Here is > > what > > >>>>> happened: > > >>>>> > > >>>>> I test-loaded the first 32 files, no problem. I deleted the > dataset, > > >>>>> created a new one, and tried to load the entire 76 files into the > > newly > > >>>>> created (hence empty) dataset. > > >>>>> > > >>>>> It took about 2mins after executing the query for the error message > > to > > >>>>> show up. There are currently 31710406 rows of data in the dataset, > > despite > > >>>>> the error message (so it looks like it did load). > > >>>>> > > >>>>> So my questions are: 1) why did I still get that error message > when I > > >>>>> was loading to an empty dataset; and 2) I'm not sure if all the > data > > from > > >>>>> the 76 file are fully loaded. Is there other ways to check, besides > > trying > > >>>>> to load it again and hope this time I don't get the error? > > >>>>> > > >>>>> Thanks! > > >>>>> Yiran > > >>>>> > > >>>>> On Thu, Feb 18, 2016 at 10:29 PM, Steven Jacobs <[email protected]> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi, > > >>>>>> Welcome! We are an Apache incubator project now so I added the > > >>>>>> correct mailing list. Our "load" statement only works on an empty > > dataset. > > >>>>>> Subsequent data needs to be added with an insert or a feed. You > > should be > > >>>>>> able to load all 76 files at once though (starting from empty). > > >>>>>> Steven > > >>>>>> > > >>>>>> > > >>>>>> On Thursday, February 18, 2016, Yiran Wang <[email protected]> > > wrote: > > >>>>>> > > >>>>>>> Hi Asterix team! > > >>>>>>> > > >>>>>>> I've come across this error when I was trying to load 76 files > into > > >>>>>>> a dataset. When I test-loaded the first 32 files, there wasn't > > such an > > >>>>>>> error. All 76 files are of the same data format. > > >>>>>>> > > >>>>>>> Can you help interpret what this error message means? > > >>>>>>> > > >>>>>>> Thanks! > > >>>>>>> Yiran > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Best, > > >>>>>>> Yiran > > >>>>>>> > > >>>>>>> -- > > >>>>>>> You received this message because you are subscribed to the > Google > > >>>>>>> Groups "asterixdb-dev" group. > > >>>>>>> To unsubscribe from this group and stop receiving emails from it, > > >>>>>>> send an email to [email protected]. > > >>>>>>> For more options, visit https://groups.google.com/d/optout. > > >>>>>>> > > >>>>>> -- > > >>>>>> You received this message because you are subscribed to the Google > > >>>>>> Groups "asterixdb-users" group. > > >>>>>> To unsubscribe from this group and stop receiving emails from it, > > >>>>>> send an email to [email protected]. > > >>>>>> For more options, visit https://groups.google.com/d/optout. > > >>>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Best, > > >>>>> Yiran > > >>>>> > > >>>>> -- > > >>>>> You received this message because you are subscribed to the Google > > >>>>> Groups "asterixdb-dev" group. > > >>>>> To unsubscribe from this group and stop receiving emails from it, > > send > > >>>>> an email to [email protected]. > > >>>>> For more options, visit https://groups.google.com/d/optout. > > >>>>> > > >>>> > > >>>> -- > > >>>> You received this message because you are subscribed to the Google > > >>>> Groups "asterixdb-dev" group. > > >>>> To unsubscribe from this group and stop receiving emails from it, > send > > >>>> an email to [email protected]. > > >>>> For more options, visit https://groups.google.com/d/optout. > > >>>> > > >>> -- > > >>> You received this message because you are subscribed to the Google > > >>> Groups "asterixdb-users" group. > > >>> To unsubscribe from this group and stop receiving emails from it, > send > > >>> an email to [email protected]. > > >>> For more options, visit https://groups.google.com/d/optout. > > >>> > > >> > > >> > > >> > > >> -- > > >> Best, > > >> Yiran > > >> > > >> -- > > >> You received this message because you are subscribed to the Google > > Groups > > >> "asterixdb-dev" group. > > >> To unsubscribe from this group and stop receiving emails from it, send > > an > > >> email to [email protected]. > > >> For more options, visit https://groups.google.com/d/optout. > > >> > > > > > > > > >
