Yiran, Could you show all AQLs involved in the loading with indicating the problematic file which includes the duplicated primary keys? Then, we may better understand what's going on and may get the solution hopefully.
On Fri, Feb 19, 2016 at 2:58 PM, Yiran Wang <[email protected]> wrote: > Young-Seok, > > Thank you for your feedback. You are right there are some duplicated > primary keys. It took me some time, but I did locate the file where the > duplicated primary keys are from. > > If the load function loads files in sequence as written in the query, the > problematic file is located towards the end. Maybe that is why there are > still many instances got loaded into the dataset before it hit the > problematic file? > > Thanks again, > Yiran > > On Fri, Feb 19, 2016 at 10:53 AM, Young-Seok Kim <[email protected]> > wrote: > >> By quickly looking at the log, there seems to exist duplicated primary >> keys in the files to be loaded. >> That seems the first cause of the problem. >> But I'm not sure why the load query continues trying to load data further >> instead of stop when the duplication was found. >> This unexpected behavior seems to have introduced the "Cannot load an >> index that is not empty" exception. >> >> The following shows the snippet of the exceptions appeared in the log >> file attached. >> >> --------------------------------------- >> SEVERE: Setting uncaught exception handler >> edu.uci.ics.hyracks.api.lifecycle.LifeCycleComponentManager@46844c3d >> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >> Input stream given to BTree bulk load has duplicates. >> Caused by: edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >> Input stream given to BTree bulk load has duplicates. >> Caused by: >> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >> Input stream given to BTree bulk load has duplicates. >> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >> edu.uci.ics.hyracks.storage.am.common.api.TreeIndexException: Cannot load >> an index that is not empty >> Caused by: edu.uci.ics.hyracks.storage.am.common.api.TreeIndexException: >> Cannot load an index that is not empty >> >> Best, >> Young-Seok >> >> On Fri, Feb 19, 2016 at 10:31 AM, Yiran Wang <[email protected]> wrote: >> >>> Abdullah, >>> >>> Here is the log attached. Thank you all very much for looking into this. >>> >>> Ian - I have two query questions besides this loading issue. I was >>> wondering if I can meet briefly with you (or over email) regarding that. >>> >>> Thanks! >>> Yiran >>> >>> On Fri, Feb 19, 2016 at 9:38 AM, Mike Carey <[email protected]> wrote: >>> >>>> Maybe Ian can visit the cluster with Yiran later today? >>>> On Feb 19, 2016 1:31 AM, "abdullah alamoudi" <[email protected]> >>>> wrote: >>>> >>>>> Yiran, >>>>> Can you share the logs? It would help us identifying the actual cause >>>>> of this failure much faster. >>>>> >>>>> I am pretty sure you know this but in case you didn't, you can get the >>>>> logs using >>>>> >managix log -n <instance-name> >>>>> >>>>> Also, it would be nice if someone from the team has access to the >>>>> cluster so we can work with it directly. >>>>> Cheers, >>>>> Abdullah. >>>>> >>>>> >>>>> On Fri, Feb 19, 2016 at 9:40 AM, Yiran Wang <[email protected]> wrote: >>>>> >>>>>> Steven, >>>>>> >>>>>> Thanks for getting back to me so quickly! I wasn't clear. Here is >>>>>> what happened: >>>>>> >>>>>> I test-loaded the first 32 files, no problem. I deleted the dataset, >>>>>> created a new one, and tried to load the entire 76 files into the newly >>>>>> created (hence empty) dataset. >>>>>> >>>>>> It took about 2mins after executing the query for the error message >>>>>> to show up. There are currently 31710406 rows of data in the dataset, >>>>>> despite the error message (so it looks like it did load). >>>>>> >>>>>> So my questions are: 1) why did I still get that error message when I >>>>>> was loading to an empty dataset; and 2) I'm not sure if all the data from >>>>>> the 76 file are fully loaded. Is there other ways to check, besides >>>>>> trying >>>>>> to load it again and hope this time I don't get the error? >>>>>> >>>>>> Thanks! >>>>>> Yiran >>>>>> >>>>>> On Thu, Feb 18, 2016 at 10:29 PM, Steven Jacobs <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> Welcome! We are an Apache incubator project now so I added the >>>>>>> correct mailing list. Our "load" statement only works on an empty >>>>>>> dataset. >>>>>>> Subsequent data needs to be added with an insert or a feed. You should >>>>>>> be >>>>>>> able to load all 76 files at once though (starting from empty). >>>>>>> Steven >>>>>>> >>>>>>> >>>>>>> On Thursday, February 18, 2016, Yiran Wang <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Asterix team! >>>>>>>> >>>>>>>> I've come across this error when I was trying to load 76 files into >>>>>>>> a dataset. When I test-loaded the first 32 files, there wasn't such an >>>>>>>> error. All 76 files are of the same data format. >>>>>>>> >>>>>>>> Can you help interpret what this error message means? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Yiran >>>>>>>> >>>>>>>> -- >>>>>>>> Best, >>>>>>>> Yiran >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "asterixdb-dev" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "asterixdb-users" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best, >>>>>> Yiran >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "asterixdb-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "asterixdb-dev" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "asterixdb-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> >>> >>> -- >>> Best, >>> Yiran >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "asterixdb-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "asterixdb-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > Best, > Yiran > > -- > You received this message because you are subscribed to the Google Groups > "asterixdb-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. >
