Re: view index build time

Brad King Wed, 02 Jul 2008 15:03:31 -0700

I've got R12B. We've also got the couchdb 0.8.0-incubating version.
I'm just curious what my expectations should be for view creation
times. Also was wondering if anyone had tried putting the design
folder on different disk to improve I/O.


On Wed, Jul 2, 2008 at 2:18 PM, Paul Davis <[EMAIL PROTECTED]> wrote:
> One thing that got me awhile back was the version of erlang I was
> using. If you're not on one of the most recent erlang versions R12B or
> some such, you might try upgrading that bit to see if it fixes things.
>
> Paul
>
> On Wed, Jul 2, 2008 at 1:58 PM, Brad King <[EMAIL PROTECTED]> wrote:
>> I created a view with emit(doc.entityobject.sku, null) to only emit
>> the doc ids. After trying attachments, I nuked the DB  and started
>> over, going back to having the documents inline. This is ok, but
>> again, the index build time of about 25 minutes for this view against
>> 300K or so docs seems long. What are you seeing as typical for
>> creating your views against a much larger set? What do your docs look
>> like? Thanks.
>>
>>
>> On Wed, Jul 2, 2008 at 10:50 AM, Jan Lehnardt <[EMAIL PROTECTED]> wrote:
>>>
>>> On Jul 2, 2008, at 16:17, Brad King wrote:
>>>
>>>> Just to post some results here of working with around 300K docs. I
>>>> changed the view to emit only the doc ID and index time went down to
>>>> about 25 minutes vs. an hour for the same dataset.
>>>>
>>>> I then converted the largest text field to an attachment and things
>>>> went down hill from there. I deleted the db and started the upload,
>>>> but repeatedly got random 500 server errors with no real way to know
>>>> what is happening or why. Also the DB size as reported by Futon seemed
>>>> to fluctuate wildly as I was adding documents. And I mean wildly like
>>>> anywhere from 1.2G then back down to 144M. Weird. I don't get a very
>>>> warm fuzzy feeling about the stability of using attachments right now.
>>>> Ideally, I don't want to use them anyway, I'd prefer to have the
>>>> fields all inline and have the database handle these docs as-is. I
>>>> don't see these as huge documents (2 to 5K) as compared to what I
>>>> would store in something like Berkeley DB XML, just for comparison
>>>> sake, so I'm hoping its a goal of the project to handle these
>>>> effectively, even when several million documents are added.
>>>
>>> This doesn't sound right at all. Can you make sure you use the
>>> very latest SVN version or the 0.8 release and completely
>>> new databases? Also, just to clarify, do you emit the doc into
>>> the view payload? As in emit(doc._id, doc); are you just doing
>>> emit(null, null); to only get the docIds that matter to you and
>>> then fetch the documents later? I have had the later setup running
>>> without any problems across ~2mio documents in a database.
>>>
>>>
>>>> As always, thanks for the help.
>>>
>>> Thanks for the problem report.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 1, 2008 at 9:26 AM, Brad King <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>> Thanks for the tips. I'll start scaling back the data I'm returning
>>>>> and see if it improves. The largest field is an html description of an
>>>>> inventory item, which seems like a good candidate for a binary
>>>>> attachment, but I need to be able to do full text searches on this
>>>>> data eventually (hopefully with the Lucene integration) so I'll
>>>>> probably try just not including the document data in the views first.
>>>>> We've had some success with Lucene independent of couchdb, so I'm
>>>>> pleased you guys are integrating this.
>>>>>
>>>>> On Sat, Jun 21, 2008 at 8:39 AM, Damien Katz <[EMAIL PROTECTED]>
>>>>> wrote:
>>>>>>
>>>>>> Part of the problem is you are storing copies of the documents into the
>>>>>> btree. If the documents are big, it takes longer to compute on them, and
>>>>>> if
>>>>>> the results (emit(...)) are big or numerous, then you'll be spending
>>>>>> most of
>>>>>> your time in I/O.
>>>>>>
>>>>>> My advice is to not emit the document into the view, and if you can, get
>>>>>> the
>>>>>> documents smaller in general. If the data can stored as an binary
>>>>>> attachment, then that too will give you a performance improvement.
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>> On Jun 20, 2008, at 4:51 PM, Brad King wrote:
>>>>>>
>>>>>>> Thanks, yes its currently at 357M and growing!
>>>>>>>
>>>>>>> On Fri, Jun 20, 2008 at 4:49 PM, Chris Anderson <[EMAIL PROTECTED]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Brad,
>>>>>>>>
>>>>>>>> You can look at
>>>>>>>>
>>>>>>>> ls -lha /usr/local/var/lib/couchdb/.my-dbname_design/
>>>>>>>>
>>>>>>>> to see the view size growing...
>>>>>>>>
>>>>>>>> It won't tell you when it's done but it will give you hope that the
>>>>>>>> progress is happening.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On Fri, Jun 20, 2008 at 1:45 PM, Brad King <[EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>> I have about 350K documents in a database. typically around 5K each.
>>>>>>>>> I
>>>>>>>>> created and saved a view which simply looks at one field in the
>>>>>>>>> document. I called the view for the first time with a key that should
>>>>>>>>> only match one document, and its been awaiting a response for about
>>>>>>>>> 45
>>>>>>>>> minutes now.
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> "sku": {
>>>>>>>>>   "map": "function(doc) { emit(doc.entityobject.SKU, doc); }"
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Is this typical, or is there some optimizing to be done on either my
>>>>>>>>> view or the server? I'm also running on a VM so this may have some
>>>>>>>>> effects, but smaller databases seem to be performing pretty well.
>>>>>>>>> Insert times to set this up were actually really good I thought, at
>>>>>>>>> 4000 to 5000 documents per minute running from my laptop.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chris Anderson
>>>>>>>> http://jchris.mfdz.com
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: view index build time

Reply via email to