Hello Danny,

 

Yes, I'm using 7.0-4.

 

>> What are you comparing it to on the Oracle side?

>> In MarkLogic, the content will be all indexed and searchable.  Is that
true on the orcl side too

 

The Oracle side is doing a basic CLOB insert with no indexing.

 

The Oracle server being compared to is a higher capacity system so we
expected to see a faster ingestion.

 

I didn't expect the MarkLogic side to be 4 times slower.

 

Yes, we tried tweaking the batch size. The 500 batch size had the fastest
load times.

 

I will investigate further but I believe the bottleneck is on the MarkLogic
side.

 

I believe the MarkLogic CPU has some room for parallelizing.

 

I'll create a custom REST Extension that will spawn multiple threads for the
doc-inserts.

 

I assume the REST API bulk ingestion already does this but I can't say for
sure.

 

I'll keep you posted.

 

Thanks Danny

 

-          Gary R

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Danny Sokolsky
Sent: Tuesday, October 14, 2014 2:00 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] How to optimize the REST API Bulk
Ingestion Performance?

 

Hi Gary,

 

A few thoughts here.  You are using 7.0-4 on this?  

 

What are you comparing it to on the Oracle side?  In MarkLogic, the content
will be all indexed and searchable.  Is that true on the orcl side too?

 

What indexes to you have enabled?  Maybe you do not need them all (or maybe
you should put the equivalent indexing on the orcl side)?

 

Have you tried tweaking the batch size?  I would try a smaller number, say
50 or 100.

 

Have you analyzed where you are spending the time?  In the c# code?  In the
code loading the doc on MarkLogic?

 

Do you have multiple threads loading from your .net program?  If you are not
maxing out your cpu on the MarkLogic side, you probably have room for more
parallelization.

 

-Danny

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo
Sent: Tuesday, October 14, 2014 9:21 AM
To: general@developer.marklogic.com
Subject: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion
Performance?

 

MarkLogic Bulk ingestion processing is slower than an equivalent Oracle
ingestion process.

 

The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7
minutes.

 

I'm using the REST API to bulk ingest multiple documents as described here.
=> http://docs.marklogic.com/guide/rest-dev/bulk#id_54649

 

Notes:

.         C# code is used to call the MarkLogic Bulk Ingest REST API.

.         Document batch size used is 500.

.         Average doc size is 1 KB.

.         JSON Conversion and Validation logic occurs in the C# code.

 

 

Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast
as Oracle's 7 minute load time?

 

 

Thanks,

Gary R

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

 

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to