If you are saying that FM can handle multiple connections at one time, then 
multiple crawlers pointing to same FM should increase performance 
significantly. But that’s not what we saw in our tests.

For example,
I saw barely 2 minutes performance difference between 2FM-6CR and 3FM-6CR.

1) 2 hour  6 minutes to process 262G   (1FM 3CR - 3CR to 1FM)
2) 1 hour 58 minutes to process 262G   (1FM 6CR - 6CR to 1FM)
3) 1 hour 39 minutes to process 262G   (2FM 6CR - 3CR to 1FM)
4) 1 hour 39 minutes to process 262G   (2FM 9CR - 4+CR to 1FM)
5) 1 hour 37 minutes to process 262G   (3FM 9CR - 3CR to 1FM)
6) 2 hour            to process 262G   (3FM 20CR - 6+CR to 1FM)
7) 28 minutes    to process 262G   (6FM 9CR - 1+CR to 1FM)   => This is my 
latest test and this is good number.

Regards
--
Chintu Mistry
NASA Goddard Space Flight Center
Bldg L40B, Room S776
Office: 240 684 0477
Mobile: 770 310 1047

From: <Mattmann>, Chris A 
<chris.a.mattm...@jpl.nasa.gov<mailto:chris.a.mattm...@jpl.nasa.gov>>
Date: Wednesday, December 12, 2012 2:51 PM
To: "Mistry, Chintu (GSFC-586.0)[COLUMBUS TECHNOLOGIES AND SERVICES INC]" 
<chintu.mis...@nasa.gov<mailto:chintu.mis...@nasa.gov>>, 
"dev@oodt.apache.org<mailto:dev@oodt.apache.org>" 
<dev@oodt.apache.org<mailto:dev@oodt.apache.org>>
Subject: Re: OODT 0.3 branch

Hey Chintu,

From: <Mistry>, "Chintu [COLUMBUS TECHNOLOGIES AND SERVICES INC] (GSFC-586.0)" 
<chintu.mis...@nasa.gov<mailto:chintu.mis...@nasa.gov>>
Date: Tuesday, December 11, 2012 2:41 PM
To: jpluser 
<chris.a.mattm...@jpl.nasa.gov<mailto:chris.a.mattm...@jpl.nasa.gov>>, 
"dev@oodt.apache.org<mailto:dev@oodt.apache.org>" 
<dev@oodt.apache.org<mailto:dev@oodt.apache.org>>
Subject: Re: OODT 0.3 branch

Answers inline below.

---snip

Gotcha, so you are using different product types. So, each crawler is crawling 
various product types in each one of the staging area dirs, that looks like 
e.g.,

/STAGING_AREA_BASE
  /dir1 – 1st crawler
   - file1 of product type 1
   - file2 of product type 3

 /dir2 – 2nd crawler
   - file3 of product type 3

 /dir3 – 3rd crawler
   - file4 of product type 2

Is that what the staging area looks like? - YES

And then your FM is ingesting all 3 product types (I just picked 3 arbitrarily 
could have been N) into:

ARCHIVE_BASE/{ProductTypeName}/{YYYYMMDD}

Correct?  - YES

If so, I would imagine if FM1 and FM2 and FM3 would actually speed up the 
ingestion process compared to just using 1 FM with 1, or 2 or 3 crawlers all 
talking to it.

Let me ask a few more questions:

Do you see e.g., in the above example that file4 is ingested before file2? What 
about file3 before file2? If not, there is something wiggy going on.
       - I have not checked that. I guess I can check that. Can FM handle 
multiple connections at the same time ?


Yep FM can handle multiple connections at one time up to a limit (I think hard 
defaulted to ~100-200 by the underlying XMLRPC 2.1 library). We're using an old 
library currently but have a goal to upgrade to the latest version where I 
think this # is configurable.

Cheers,
Chris

  • Re: OO... Mistry, Chintu (GSFC-586.0)[COLUMBUS TECHNOLOGIES AND SERVICES INC]
    • R... Cameron Goodale
      • ... Mattmann, Chris A (388J)
      • ... Mistry, Chintu (GSFC-586.0)[COLUMBUS TECHNOLOGIES AND SERVICES INC]
        • ... Mattmann, Chris A (388J)

Reply via email to