FANTASTIC!

The change to
MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
"|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000

brought SpamDB generation down to under 3 minutes.

Apr-29-15 09:41:41 Generating weighted Bayesian tuplets
Apr-29-15 09:42:16 start populating Spamdb with 970,294 records - Bayesian
check is now disabled!
Apr-29-15 09:44:29 Finished populating Spamdb with 970,294 records -
Bayesian check is now enabled!
Apr-29-15 09:44:29 done - Generating weighted Bayesian tuplets

This is great, though I have no idea what your change is actually doing.

3 minutes isn't bad at all though, at least not relative to where we were
before.  In another thread, you (Thomas) posted this:
Apr-24-15 04:28:29 start populating Spamdb with 387,637 records - Bayesian
check is now disabled!
Apr-24-15 04:29:11 Finished populating Spamdb with 387,637 records -
Bayesian check is now enabled!
Apr-24-15 04:29:11 done - Generating weighted Bayesian tuplets
that's about 9k a second

Mine comes to around 5700 a second.  Still slower than you're seeing.  Are
you running MySQL Community 5.24?   Any changes to the defaults?


And other questions:
1) Is MaxBytes still recommended to be around 3000 when the corupus is
full?  Seems to me like my spam messages in corpus are way bigger than 3k
on average.
2) I have RebuildThreadCycleType set to the default of 30.  Do you think I
should lower this number?  Is it likely to improve rebuild time?
3) How many messages do you have in the corpus?  And how long is it taking
you to process each folder?  I'm trying to gauge if I need to beg for
better hardware (charity, not easy).  I'm seeing strange results
(consistent but strange in my opinion)

Apr-29-15 08:58:54 Processing... messages/errors-spam with 5,679 files
Apr-29-15 09:10:59 Finished in 725 second(s)      0.12 secs per message

Apr-29-15 09:10:59 Processing... messages/errors-notspam with 5,478 files
Apr-29-15 09:31:41 Finished in 1,242 second(s)  0.22 secs per message

Apr-29-15 09:31:41 Processing... messages/spam with 11,805 files
Apr-29-15 09:40:07 Finished in 506 second(s)    0.04 secs per message (why
so much faster than errors-spam)

Apr-29-15 09:40:07 Processing... messages/notspam with 11,760 files
Apr-29-15 09:41:41 Finished in 94 second(s)    0.008 secs per message -
WOW-  why is this always SO fast relative to others.

This is at 3k size.


And last
Have you given any thought into using a temporary table for this?  You
could populate a table called SpamDB.rebuilding or something without having
to disable bayesian checks during the rebuild.  Then quickly turn off
bayesian, delete spamdb, rename spamdb.rebuilding.  Could do the same with
HMM.



On Wed, Apr 29, 2015 at 5:00 AM, Thomas Eckardt <thomas.ecka...@thockar.com>
wrote:

> Try the following:
>
> change the following line in the assp_db_import.cfg
>
> MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
> "|$sql_sm="($k,$v,\'$f\')"|$sql_sm=","|1000
>
> in to
>
> MySQL|*|NOOP|NOOP|$sql_sm="INSERT IGNORE INTO $mysqlTable VALUES
> "|$sql_sm="($k,$v,\'0\')"|$sql_sm=","|1000
>
> it replaces $f with 0  - "($k,$v,\'$f\') ->  "($k,$v,\'0\')
>
> This issue seems to depend on anything, which is currently unknown to me.
> I'm just looking for the reason, but this change in assp_db_import.cfg
> will prevent skipping the Bulk-Import.
> The next build will fix the issue, that the original assp_db_import.cfg
> can be used.
>
> Thomas
>
>
>
>
>
> Von:    K Post <nntp.p...@gmail.com>
> An:     ASSP development mailing list <assp-test@lists.sourceforge.net>
> Datum:  28.04.2015 20:10
> Betreff:        Re: [Assp-test] MySQL vs BerkeleyDB
>
>
>
> Sorry for the seemingly incessant emails...
>
> Error: You have an error in your SQL syntax; check the manual that
> corresponds to your MySQL server version for the right syntax to use near
> 'INSERT IGNORE INTO spamdb VALUES ,INSERT IGNORE INTO spamdb VALUES
> ,INSERT
> IGNOR' at line 1
>
> Don't recall having seen this before.  I'm now using assp_db_import.cfg
> straight from cvs, no edits.  Do I need to edit this to use with mysql
> too?  Or should I?  looks like the maximum records for insert in bulk is
> 1000, should I change that?
>
>
>
> On Tue, Apr 28, 2015 at 10:15 AM, K Post <nntp.p...@gmail.com> wrote:
>
> > and note, looking periodically at the worker status window in the web
> > admin, I see "chkdb - finished" for quite some time after the 40k files
> > have been processed.  I think this is while spamdb is being generated.
> >
> > On Tue, Apr 28, 2015 at 9:29 AM, K Post <nntp.p...@gmail.com> wrote:
> >
> >> and why would the rebuild of hmm in berkeleydb take only seconds, but
> the
> >> spamdb in mysql (on same box) take 45 minutes?
> >>
> >> On Tue, Apr 28, 2015 at 9:28 AM, K Post <nntp.p...@gmail.com> wrote:
> >>
> >>> preventBulkImport is not checked.
> >>>
> >>> I've reinstalled the VM from scratch.  New OS installation, using the
> >>> perl distribution 5.20 from
> >>>
>
> http://sourceforge.net/projects/assp/files/ASSP%20V2%20multithreading/ASSP%20V2%20module%20installation/
>
> >>>
> >>> Parsing the files, I'm talking about Apr-28-15 02:14:20 Processing...
> >>> messages/notspam with 14,759 files:
> >>> I'm worried that just parsing through the 40k files is about 65%
> slower
> >>> than it is on the old production box using the same corpus (copied to
> the
> >>> dev machine) even though the old box is less than 1/2 the processing
> power,
> >>> has 40% slower disks, and 1/4 the RAM.  That very old installation
> doesn't
> >>> have HMM in the code, yes it's that old.  When rb_processfolder runs
> in the
> >>> latest version, is it doing more processing of each file because of
> the HMM
> >>> option?   I can't imagine why it would take so much longer on the new
> >>> faster hardware.  Any temporary code modifications I can make to see
> what's
> >>> taking so long?
> >>>
> >>> Is there a spot in code where I could also modify bulk import of
> spamdb
> >>> during the rebuild?  I'd like to see if I can modify that as a test to
> >>> write the import script as a file, ultimately to test how long it
> takes to
> >>> import. Or any suggestions on timing this would be great.
> >>>
> >>> I'm really struggling here, thanks for the help.
> >>>
> >>>
> >>> On Tue, Apr 28, 2015 at 4:19 AM, Thomas Eckardt <
> >>> thomas.ecka...@thockar.com> wrote:
> >>>
> >>>> populating the SpamDB and HMMdbis a  "DB Import". Check that
> >>>> 'preventBulkImport' is disabled!
> >>>>
> >>>> Thomas
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Von:    K Post <nntp.p...@gmail.com>
> >>>> An:     ASSP development mailing list
> <assp-test@lists.sourceforge.net>
> >>>> Datum:  27.04.2015 20:32
> >>>> Betreff:        [Assp-test] MySQL vs BerkeleyDB
> >>>>
> >>>>
> >>>>
> >>>> Hi all-
> >>>>
> >>>> I'm having a rough go getting the rebuild process to quickly rebuild
> >>>> spamdb.  The HMM db, which I have using BerkeleyDB rebuilds
> wonderfully,
> >>>> in
> >>>> under a minute.  However, spamdb, which uses MySQL, is taking over 45
> >>>> minutes.  That's no good.
> >>>>
> >>>> The real question is if there is a downside for using BerkeleyDB for
> >>>> everything?
> >>>>
> >>>> In reality, I'd like to figure out why my installation is taking so
> slow
> >>>> with MySQL (and I've got another stalled out thread going on that). I
> >>>> worry about the lack of management tools with BerkeleyDB.  I'd be
> >>>> uncomfortable with the whitelist being in Berkeley.
> >>>>
> >>>>
> >>>> More info:
> >>>>
> >>>> ASSP and MySQL are running on the same Windows 2012 hypver-v virtual
> >>>> machine.  16gb ram.  4gb ram disk for c:/assp/tmpDB (using the imdisk
> >>>> driver),  The vm seems to be running quickly for all other tasks.
> >>>>
> >>>> I've got a corpus of around 15k spam, 15k not spam, and 5k errors for
> >>>> each
> >>>> of error-spam and error-notspam (so about 40k total).  It takes about
> 45
> >>>> minutes to go through all of these messages and I'm okay with that
> >>>>
> >>>> MySQL is using the setting suggested here:
> >>>> http://sourceforge.net/p/assp/mailman/message/29893302/ by Thomas,
> >>>> though net_buffer_length
> >>>> is limited to 1M according to the documentation.
> >>>>
> >>>> Apr-27-15 13:23:47 start populating Spamdb with 1,140,905 records -
> >>>> Bayesian check is now disabled!
> >>>> Apr-27-15 14:07:09 Finished populating Spamdb with 1,140,905 records
> -
> >>>> Bayesian check is now enabled!
> >>>>
> >>>>
> >>>> I'd really like to stick with MySQL for spamdb and the other
> databases,
> >>>> but
> >>>> berkeleydb as recommended for HMM.  I just can't see doing that if
> the
> >>>> rebuild of spamdb will be so slow.
> >>>>
> >>>> What kind of speeds is everyone else seeing for the spamdb rebuild
> >>>> portion
> >>>> of the rebuild?
> >>>>
> >>>> I'd love some suggestions on speeding up MySQL or anything else.
> Thank
> >>>> you
> >>>>
> >>>> Ken
> >>>>
> >>>>
>
> ------------------------------------------------------------------------------
> >>>> One dashboard for servers and applications across
> Physical-Virtual-Cloud
> >>>> Widest out-of-the-box monitoring support with 50+ applications
> >>>> Performance metrics, stats and reports that give you Actionable
> Insights
> >>>> Deep dive visibility with transaction tracing using APM Insight.
> >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >>>> _______________________________________________
> >>>> Assp-test mailing list
> >>>> Assp-test@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/assp-test
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> DISCLAIMER:
> >>>> *******************************************************
> >>>> This email and any files transmitted with it may be confidential,
> >>>> legally
> >>>> privileged and protected in law and are intended solely for the use
> of
> >>>> the
> >>>>
> >>>> individual to whom it is addressed.
> >>>> This email was multiple times scanned for viruses. There should be no
> >>>> known virus in this email!
> >>>> *******************************************************
> >>>>
> >>>>
> >>>>
>
> ------------------------------------------------------------------------------
> >>>> One dashboard for servers and applications across
> Physical-Virtual-Cloud
> >>>> Widest out-of-the-box monitoring support with 50+ applications
> >>>> Performance metrics, stats and reports that give you Actionable
> Insights
> >>>> Deep dive visibility with transaction tracing using APM Insight.
> >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >>>> _______________________________________________
> >>>> Assp-test mailing list
> >>>> Assp-test@lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/assp-test
> >>>>
> >>>
> >>>
> >>
> >
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
>
>
>
>
>
> DISCLAIMER:
> *******************************************************
> This email and any files transmitted with it may be confidential, legally
> privileged and protected in law and are intended solely for the use of the
>
> individual to whom it is addressed.
> This email was multiple times scanned for viruses. There should be no
> known virus in this email!
> *******************************************************
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Assp-test mailing list
> Assp-test@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/assp-test
>
------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test

Reply via email to