Hi Galen,

I'll try to give more informations. You will find the patch we applied on wip/solr (on git.biblibre.com) attached.

I had another issue, with another sax parser:
Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /usr/lib/perl5/XML/LibXML/Error.pm line 217. problem with :84 : :61: parser error : Input is not proper UTF-8, indicate encoding !

I don't have any problem without this patch but we would like to improve performance with this call.

Sax parser used is XML::LibXML::SAX::Parser.

Do I answer to your questions?

Claire;

From:
*Galen Charlton*
Date:
December 13, 2010 06:16
Subject:
Re: Problem with MARC::File::XML
Message ID:
aanlktinutec98fzq4dzz=t6hlvvyqe0nq4=uzepqp...@mail.gmail.com <http://www.nntp.perl.org/group/perl.perl4lib/;msgid=aanlktinutec98fzq4dzz=t6hlvvyqe0nq4=uzepqp...@mail.gmail.com>

Hi,

On Thu, Dec 2, 2010 at 5:09 AM, LAURENT Henri-Damien
<henridamien.laur...@biblibre.com>  wrote:
 we are currently trying to use MARC::File::XML on multi threaded
 decoding of records.
 We are encountering erratic error of Parsing.
   problem with :25 :
 not well-formed (invalid token) at line 32, column 27, byte 1216 at
 /usr/lib/perl5/XML/Parser.pm line 187

 <?xml version="1.0" encoding="UTF-8"?>
 <record
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
     xsi:schemaLocation="http://www.loc.gov/MARC21/slim
 http://www.loc.gov/ standards/marcxml/schema/MARC21slim.xsd"
     xmlns="http://www.loc.gov/MARC21/slim";>

   <leader>01020nam0a2200301   4500</leader>
   <datafield tag="010" ind1=" " ind2=" ">

 Has anyone ever come into such a problem ?
 If so is there a workaround ?

Can you supply a test case?  Which SAX parser are you using?

Regards,

Galen
--
Galen Charlton
gmcha...@gmail.com

>From 0ea67fd8fd6c52b25ce700d21f89b1176cd0763a Mon Sep 17 00:00:00 2001
From: Claire Hernandez <claire.hernan...@biblibre.com>
Date: Tue, 14 Dec 2010 16:12:05 +0100
Subject: [PATCH] [SOLR] first try to parrallelise solr call and data process

---
 C4/Search.pm |   26 ++++++++++++++++++++++++--
 1 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/C4/Search.pm b/C4/Search.pm
index 7afd80f..5e1c8a5 100755
--- a/C4/Search.pm
+++ b/C4/Search.pm
@@ -2957,6 +2957,10 @@ sub IndexRecord {
     my $indexes = GetIndexes( $recordtype );
     my $sc      = GetSolrConnection;
 
+    #for fork to have less mysql errors
+    my $dbh = C4::Context->dbh;
+    $dbh->{'mysql_auto_reconnect'} = 1;
+
     my @recordpush;
     for my $id ( @$recordids ) {
         my $record;
@@ -3019,14 +3023,32 @@ sub IndexRecord {
         }
         push @recordpush, $solrrecord;
 
-        if ( @recordpush == 5000 ) {
-            $sc->add( \...@recordpush );
+        if ( @recordpush == 50 ) {
+            #fork
+            my $prevrecordpush = \...@recordpush;
+            push_records($prevrecordpush, $sc);
             @recordpush = ();
+            #no fork
+            #$sc->add (@recordpush);
+            #...@recordpush = ();
         }
     }
     $sc->add( \...@recordpush );
 }
 
+sub push_records {
+    my $records = shift;
+    my $sc = shift;
+    my $child = fork;
+    if ($child != 0) {
+        my $dbh = C4::Context->dbh;
+        $sc->add( $records );
+        exit(0)
+    }else{
+        return
+    }
+}
+
 sub NormalizeDate {
     given( shift ) {
         when( /^(\d{2}).(\d{2}).(\d{4})$/ ) { return "$3-$2-$1T00:00:00Z" }
-- 
1.7.1

Reply via email to