Sounds like a good candidate for a JIRA issue, so we won't forget. :) Cheers, Till
> On Feb 20, 2016, at 21:44, abdullah alamoudi <[email protected]> wrote: > > Totally agree. Probably better make sure it works nicely with that many > tasks and then fix the number of readers. > > Cheers, > Abdullah. > >> On Sun, Feb 21, 2016 at 2:04 AM, Mike Carey <[email protected]> wrote: >> >> Sounds like the load job parallelism needs a redo - it probably shouldn't >> be more than the number of target partitions IMO...? >>> On Feb 20, 2016 12:41 PM, "abdullah alamoudi" <[email protected]> wrote: >>> >>> I have an idea that might explain why such a strange behavior happened. I >>> believe it could be due to the number of task partitions being very high >>> assuming each of the 76 files is being read in a separate task. >>> This could potentially lead to some corner cases that we didn't consider >>> before considering the number of threads in the tasks thread pool is less >>> than 76, some tasks will not be able to start until others have completed >>> execution. >>> >>> Just a thought, >>> Abdullah. >>> >>> On Fri, Feb 19, 2016 at 9:43 PM, abdullah alamoudi <[email protected]> >>> wrote: >>> >>>> Yiran, >>>> Here is one problem causing a failure: >>>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >>>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >>>> Input stream given to BTree bulk load has duplicates. >>>> >>>> which tells us that Input stream given to BTree bulk load has >> duplicates. >>>> The question is why this was not returned as the error message? We need >>> to >>>> look into that. >>>> >>>> I will continue looking at the log file to see if there were other >>> issues. >>>> >>>> Can you share with us the load statement you're using? I would like to >>> see >>>> how you're loading all the files. we might be able to suggest a way to >>> make >>>> it work better. >>>> >>>> Cheers, >>>> Abdullah. >>>> >>>>> On Fri, Feb 19, 2016 at 9:31 PM, Yiran Wang <[email protected]> wrote: >>>>> >>>>> Abdullah, >>>>> >>>>> Here is the log attached. Thank you all very much for looking into >> this. >>>>> >>>>> Ian - I have two query questions besides this loading issue. I was >>>>> wondering if I can meet briefly with you (or over email) regarding >> that. >>>>> >>>>> Thanks! >>>>> Yiran >>>>> >>>>> On Fri, Feb 19, 2016 at 9:38 AM, Mike Carey <[email protected]> >> wrote: >>>>> >>>>>> Maybe Ian can visit the cluster with Yiran later today? >>>>>> On Feb 19, 2016 1:31 AM, "abdullah alamoudi" <[email protected]> >>> wrote: >>>>>> >>>>>>> Yiran, >>>>>>> Can you share the logs? It would help us identifying the actual >> cause >>>>>>> of this failure much faster. >>>>>>> >>>>>>> I am pretty sure you know this but in case you didn't, you can get >> the >>>>>>> logs using >>>>>>>> managix log -n <instance-name> >>>>>>> >>>>>>> Also, it would be nice if someone from the team has access to the >>>>>>> cluster so we can work with it directly. >>>>>>> Cheers, >>>>>>> Abdullah. >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 19, 2016 at 9:40 AM, Yiran Wang <[email protected]> >>> wrote: >>>>>>> >>>>>>>> Steven, >>>>>>>> >>>>>>>> Thanks for getting back to me so quickly! I wasn't clear. Here is >>> what >>>>>>>> happened: >>>>>>>> >>>>>>>> I test-loaded the first 32 files, no problem. I deleted the >> dataset, >>>>>>>> created a new one, and tried to load the entire 76 files into the >>> newly >>>>>>>> created (hence empty) dataset. >>>>>>>> >>>>>>>> It took about 2mins after executing the query for the error message >>> to >>>>>>>> show up. There are currently 31710406 rows of data in the dataset, >>> despite >>>>>>>> the error message (so it looks like it did load). >>>>>>>> >>>>>>>> So my questions are: 1) why did I still get that error message >> when I >>>>>>>> was loading to an empty dataset; and 2) I'm not sure if all the >> data >>> from >>>>>>>> the 76 file are fully loaded. Is there other ways to check, besides >>> trying >>>>>>>> to load it again and hope this time I don't get the error? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Yiran >>>>>>>> >>>>>>>> On Thu, Feb 18, 2016 at 10:29 PM, Steven Jacobs <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> Welcome! We are an Apache incubator project now so I added the >>>>>>>>> correct mailing list. Our "load" statement only works on an empty >>> dataset. >>>>>>>>> Subsequent data needs to be added with an insert or a feed. You >>> should be >>>>>>>>> able to load all 76 files at once though (starting from empty). >>>>>>>>> Steven >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thursday, February 18, 2016, Yiran Wang <[email protected]> >>> wrote: >>>>>>>>> >>>>>>>>>> Hi Asterix team! >>>>>>>>>> >>>>>>>>>> I've come across this error when I was trying to load 76 files >> into >>>>>>>>>> a dataset. When I test-loaded the first 32 files, there wasn't >>> such an >>>>>>>>>> error. All 76 files are of the same data format. >>>>>>>>>> >>>>>>>>>> Can you help interpret what this error message means? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> Yiran >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best, >>>>>>>>>> Yiran >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> You received this message because you are subscribed to the >> Google >>>>>>>>>> Groups "asterixdb-dev" group. >>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>>> send an email to [email protected]. >>>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "asterixdb-users" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best, >>>>>>>> Yiran >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "asterixdb-dev" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>> send >>>>>>>> an email to [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "asterixdb-dev" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >> send >>>>>>> an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "asterixdb-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >> send >>>>>> an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>>> >>>>> -- >>>>> Best, >>>>> Yiran >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>> Groups >>>>> "asterixdb-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>> an >>>>> email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>
