Re: Cannot load an index that is not empty [TreeIndexException]

Till Westmann Sat, 20 Feb 2016 21:52:26 -0800

Sounds like a good candidate for a JIRA issue, so we won't forget. :)

Cheers,
Till


> On Feb 20, 2016, at 21:44, abdullah alamoudi <[email protected]> wrote:
> 
> Totally agree. Probably better make sure it works nicely with that many
> tasks and then fix the number of readers.
> 
> Cheers,
> Abdullah.
> 
>> On Sun, Feb 21, 2016 at 2:04 AM, Mike Carey <[email protected]> wrote:
>> 
>> Sounds like the load job parallelism needs a redo - it probably shouldn't
>> be more than the number of target partitions IMO...?
>>> On Feb 20, 2016 12:41 PM, "abdullah alamoudi" <[email protected]> wrote:
>>> 
>>> I have an idea that might explain why such a strange behavior happened. I
>>> believe it could be due to the number of task partitions being very high
>>> assuming each of the 76 files is being read in a separate task.
>>> This could potentially lead to some corner cases that we didn't consider
>>> before considering the number of threads in the tasks thread pool is less
>>> than 76, some tasks will not be able to start until others have completed
>>> execution.
>>> 
>>> Just a thought,
>>> Abdullah.
>>> 
>>> On Fri, Feb 19, 2016 at 9:43 PM, abdullah alamoudi <[email protected]>
>>> wrote:
>>> 
>>>> Yiran,
>>>> Here is one problem causing a failure:
>>>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException:
>>>> edu.uci.ics.hyracks.api.exceptions.HyracksDataException:
>> edu.uci.ics.hyracks.storage.am.common.exceptions.TreeIndexDuplicateKeyException:
>>>> Input stream given to BTree bulk load has duplicates.
>>>> 
>>>> which tells us that Input stream given to BTree bulk load has
>> duplicates.
>>>> The question is why this was not returned as the error message? We need
>>> to
>>>> look into that.
>>>> 
>>>> I will continue looking at the log file to see if there were other
>>> issues.
>>>> 
>>>> Can you share with us the load statement you're using? I would like to
>>> see
>>>> how you're loading all the files. we might be able to suggest a way to
>>> make
>>>> it work better.
>>>> 
>>>> Cheers,
>>>> Abdullah.
>>>> 
>>>>> On Fri, Feb 19, 2016 at 9:31 PM, Yiran Wang <[email protected]> wrote:
>>>>> 
>>>>> Abdullah,
>>>>> 
>>>>> Here is the log attached. Thank you all very much for looking into
>> this.
>>>>> 
>>>>> Ian - I have two query questions besides this loading issue. I was
>>>>> wondering if I can meet briefly with you (or over email) regarding
>> that.
>>>>> 
>>>>> Thanks!
>>>>> Yiran
>>>>> 
>>>>> On Fri, Feb 19, 2016 at 9:38 AM, Mike Carey <[email protected]>
>> wrote:
>>>>> 
>>>>>> Maybe Ian can visit the cluster with Yiran later today?
>>>>>> On Feb 19, 2016 1:31 AM, "abdullah alamoudi" <[email protected]>
>>> wrote:
>>>>>> 
>>>>>>> Yiran,
>>>>>>> Can you share the logs? It would help us identifying the actual
>> cause
>>>>>>> of this failure much faster.
>>>>>>> 
>>>>>>> I am pretty sure you know this but in case you didn't, you can get
>> the
>>>>>>> logs using
>>>>>>>> managix log -n <instance-name>
>>>>>>> 
>>>>>>> Also, it would be nice if someone from the team has access to the
>>>>>>> cluster so we can work with it directly.
>>>>>>> Cheers,
>>>>>>> Abdullah.
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Feb 19, 2016 at 9:40 AM, Yiran Wang <[email protected]>
>>> wrote:
>>>>>>> 
>>>>>>>> Steven,
>>>>>>>> 
>>>>>>>> Thanks for getting back to me so quickly! I wasn't clear. Here is
>>> what
>>>>>>>> happened:
>>>>>>>> 
>>>>>>>> I test-loaded the first 32 files, no problem. I deleted the
>> dataset,
>>>>>>>> created a new one, and tried to load the entire 76 files into the
>>> newly
>>>>>>>> created (hence empty) dataset.
>>>>>>>> 
>>>>>>>> It took about 2mins after executing the query for the error message
>>> to
>>>>>>>> show up. There are currently 31710406 rows of data in the dataset,
>>> despite
>>>>>>>> the error message (so it looks like it did load).
>>>>>>>> 
>>>>>>>> So my questions are: 1) why did I still get that error message
>> when I
>>>>>>>> was loading to an empty dataset; and 2) I'm not sure if all the
>> data
>>> from
>>>>>>>> the 76 file are fully loaded. Is there other ways to check, besides
>>> trying
>>>>>>>> to load it again and hope this time I don't get the error?
>>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> Yiran
>>>>>>>> 
>>>>>>>> On Thu, Feb 18, 2016 at 10:29 PM, Steven Jacobs <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> Welcome! We are an Apache incubator project now so I added the
>>>>>>>>> correct mailing list. Our "load" statement only works on an empty
>>> dataset.
>>>>>>>>> Subsequent data needs to be added with an insert or a feed. You
>>> should be
>>>>>>>>> able to load all 76 files at once though (starting from empty).
>>>>>>>>> Steven
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thursday, February 18, 2016, Yiran Wang <[email protected]>
>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Asterix team!
>>>>>>>>>> 
>>>>>>>>>> I've come across this error when I was trying to load 76 files
>> into
>>>>>>>>>> a dataset. When I test-loaded the first 32 files, there wasn't
>>> such an
>>>>>>>>>> error. All 76 files are of the same data format.
>>>>>>>>>> 
>>>>>>>>>> Can you help interpret what this error message means?
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> Yiran
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best,
>>>>>>>>>> Yiran
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> You received this message because you are subscribed to the
>> Google
>>>>>>>>>> Groups "asterixdb-dev" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "asterixdb-users" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best,
>>>>>>>> Yiran
>>>>>>>> 
>>>>>>>> --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "asterixdb-dev" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>> send
>>>>>>>> an email to [email protected].
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>> 
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "asterixdb-dev" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>> send
>>>>>>> an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "asterixdb-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>> send
>>>>>> an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best,
>>>>> Yiran
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>> Groups
>>>>> "asterixdb-dev" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an
>>>>> email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>

Re: Cannot load an index that is not empty [TreeIndexException]

Reply via email to