Young-Seok, I will just go ahead and change the duplicated keys I have in my original file. That should solve my loading problem. I was describing what's going on in case that is relevant for you to understand why a lot of files still got loaded into the dataset.
Thanks! Yiran On Fri, Feb 19, 2016 at 3:19 PM, Young-Seok Kim <[email protected]> wrote: > Yiran, > > Could you show all AQLs involved in the loading with indicating the > problematic file which includes the duplicated primary keys? > Then, we may better understand what's going on and may get the solution > hopefully. > > On Fri, Feb 19, 2016 at 2:58 PM, Yiran Wang <[email protected]> wrote: > >> Young-Seok, >> >> Thank you for your feedback. You are right there are some duplicated >> primary keys. It took me some time, but I did locate the file where the >> duplicated primary keys are from. >> >> If the load function loads files in sequence as written in the query, the >> problematic file is located towards the end. Maybe that is why there are >> still many instances got loaded into the dataset before it hit the >> problematic file? >> >> Thanks again, >> Yiran >> >> On Fri, Feb 19, 2016 at 10:53 AM, Young-Seok Kim <[email protected]> >> wrote: >> >>> By quickly looking at the log, there seems to exist duplicated primary >>> keys in the files to be loaded. >>> That seems the first cause of the problem. >>> But I'm not sure why the load query continues trying to load data >>> further instead of stop when the duplication was found. >>> This unexpected behavior seems to have introduced the "Cannot load an >>> index that is not empty" exception. >>> >>> The following shows the snippet of the exceptions appeared in the log >>> file attached. >>> >>> --------------------------------------- >>> SEVERE: Setting uncaught exception handler >>> edu.uci.ics.hyracks.api.lifecycle.LifeCycleComponentManager@46844c3d >>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >>> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >>> Input stream given to BTree bulk load has duplicates. >>> Caused by: edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >>> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >>> Input stream given to BTree bulk load has duplicates. >>> Caused by: >>> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException: >>> Input stream given to BTree bulk load has duplicates. >>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException: >>> edu.uci.ics.hyracks.storage.am.common.api.TreeIndexException: Cannot load >>> an index that is not empty >>> Caused by: edu.uci.ics.hyracks.storage.am.common.api.TreeIndexException: >>> Cannot load an index that is not empty >>> >>> Best, >>> Young-Seok >>> >>> On Fri, Feb 19, 2016 at 10:31 AM, Yiran Wang <[email protected]> wrote: >>> >>>> Abdullah, >>>> >>>> Here is the log attached. Thank you all very much for looking into this. >>>> >>>> Ian - I have two query questions besides this loading issue. I was >>>> wondering if I can meet briefly with you (or over email) regarding that. >>>> >>>> Thanks! >>>> Yiran >>>> >>>> On Fri, Feb 19, 2016 at 9:38 AM, Mike Carey <[email protected]> wrote: >>>> >>>>> Maybe Ian can visit the cluster with Yiran later today? >>>>> On Feb 19, 2016 1:31 AM, "abdullah alamoudi" <[email protected]> >>>>> wrote: >>>>> >>>>>> Yiran, >>>>>> Can you share the logs? It would help us identifying the actual cause >>>>>> of this failure much faster. >>>>>> >>>>>> I am pretty sure you know this but in case you didn't, you can get >>>>>> the logs using >>>>>> >managix log -n <instance-name> >>>>>> >>>>>> Also, it would be nice if someone from the team has access to the >>>>>> cluster so we can work with it directly. >>>>>> Cheers, >>>>>> Abdullah. >>>>>> >>>>>> >>>>>> On Fri, Feb 19, 2016 at 9:40 AM, Yiran Wang <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> Steven, >>>>>>> >>>>>>> Thanks for getting back to me so quickly! I wasn't clear. Here is >>>>>>> what happened: >>>>>>> >>>>>>> I test-loaded the first 32 files, no problem. I deleted the dataset, >>>>>>> created a new one, and tried to load the entire 76 files into the newly >>>>>>> created (hence empty) dataset. >>>>>>> >>>>>>> It took about 2mins after executing the query for the error message >>>>>>> to show up. There are currently 31710406 rows of data in the dataset, >>>>>>> despite the error message (so it looks like it did load). >>>>>>> >>>>>>> So my questions are: 1) why did I still get that error message when >>>>>>> I was loading to an empty dataset; and 2) I'm not sure if all the data >>>>>>> from >>>>>>> the 76 file are fully loaded. Is there other ways to check, besides >>>>>>> trying >>>>>>> to load it again and hope this time I don't get the error? >>>>>>> >>>>>>> Thanks! >>>>>>> Yiran >>>>>>> >>>>>>> On Thu, Feb 18, 2016 at 10:29 PM, Steven Jacobs <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> Welcome! We are an Apache incubator project now so I added the >>>>>>>> correct mailing list. Our "load" statement only works on an empty >>>>>>>> dataset. >>>>>>>> Subsequent data needs to be added with an insert or a feed. You should >>>>>>>> be >>>>>>>> able to load all 76 files at once though (starting from empty). >>>>>>>> Steven >>>>>>>> >>>>>>>> >>>>>>>> On Thursday, February 18, 2016, Yiran Wang <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Asterix team! >>>>>>>>> >>>>>>>>> I've come across this error when I was trying to load 76 files >>>>>>>>> into a dataset. When I test-loaded the first 32 files, there wasn't >>>>>>>>> such an >>>>>>>>> error. All 76 files are of the same data format. >>>>>>>>> >>>>>>>>> Can you help interpret what this error message means? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> Yiran >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best, >>>>>>>>> Yiran >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "asterixdb-dev" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "asterixdb-users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best, >>>>>>> Yiran >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "asterixdb-dev" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "asterixdb-dev" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "asterixdb-users" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> >>>> >>>> -- >>>> Best, >>>> Yiran >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "asterixdb-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "asterixdb-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Best, >> Yiran >> >> -- >> You received this message because you are subscribed to the Google Groups >> "asterixdb-dev" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "asterixdb-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Best, Yiran
