Discussing the future format of the search index is also part of the upcoming 
developers meeting.

I'd like to encourage you to participate.

Vote for a weekend which will fit for you to have the developers meeting:
http://www.doodle.com/xaf4tzpuwk2xf59h

Greets,


Manuel


Am Montag, 6. Juli 2009 15:15:45 schrieb [email protected]:
> I have answered to Rotem about the links. I have also open a bug on the
> Kiwix side:
> https://sourceforge.net/tracker/?func=detail&aid=2817440&group_id=175508&at
>id=873515
>
> For the search engine index size, we have to search a solution with a
> smaller index. Starting with the openzim solution should be good.
> I will have a look during this week.
>
> Emmanuel
>
>  Le lun 06/07/09 15:03, "Asaf Bartov" [email protected] a écrit:
> > Clarification:
> >
> > This last message was by Rotem, a fellow WM-IL member helping me with
> > the embedding of the Hebrew Wikipedia in the One Computer Per Child
> > project.
> >
> > He is reporting issues with Kiwix and the ZIM file I created last
> > week.
> >
> > Regarding size:  Size is important, because we intend to add images
> > (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no
> > pictures).  We are hoping to have at least 5GB reserved for us in
> > those One Computer Per Child machines we are to install on, but we may
> > be forced to make do with 3GB.  So every MB saved from the index, is
> > another MB available for images...
> >
> >    Asaf Bartov
> >    Wikimedia Israel
> >
> > On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha  wrote:
> > * there are some errors in links of files and special pages
> > examples
> > קובץ:Nuvola_apps_important.svg [1] link to
> > ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים
> > ללא תמונות/קטגוריות/ספורטאים איטלקים
> > (wikipedia:wikipedia projects articles without imagescategoriesSports
> > people from Italy)
> > מיוחד:אקראי (Special:Random) > 15 במאי (may 15)
> > מיוחד:שינויים אחרונים (Special:RecentChanges) >
> > 10_באוגוסט
> >
> > * size is important because we intend to add images
> >
> > 2009/7/6
> > Send dev-l mailing list submissions to
> >        [email protected]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        https://intern.openzim.org/mailman/listinfo/dev-l [2]
> > or, via email, send a message with subject or body help to
> >        [email protected]
> >
> > You can reach the person managing the list at
> >        [email protected]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of dev-l digest..."
> >
> > Todays Topics:
> >
> >   1. Kiwix index size (Asaf Bartov)
> >   2. Re: Kiwix index size (Manuel Schneider)
> >   3. Re: Kiwix index size (Emmanuel Engelhart)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Sun, 5 Jul 2009 19:18:57 +0300
> > From: Asaf Bartov
> > Subject: [openZIM dev-l] Kiwix index size
> > To: [email protected]
> > Message-ID:
> >      
> >  
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > Hi, everyone.
> >
> > When running Kiwixs indexer on the ZIM file I had created from the
> > Hebrew
> > Wikipedia last week, the Kiwix data directory ran up to a total of
> > 31 items,
> > totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this
> > proportion make
> > sense?
> >
> > Detailed ls output attached.
> >
> > Thanks in advance,
> >
> >   Asaf Bartov
> > --
> > Asaf Bartov
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL:
> >  tachment.html[3]>
> > -------------- next part --------------
> > ro...@desktop:~/.www.kiwix.org/kiwix$ [4] ls -l -h -a -R
> > .:
> > total 16K
> > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .
> > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
> > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default
> > -rw-r--r-- 1 rotem rotem   94 2009-07-01 16:10 profiles.ini
> >
> > ./7680jxd5.default:
> > total 1.7M
> > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .
> > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 ..
> > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13
> > 31c26198d06ad265677b450796cc09aa.index
> > -rw------- 1 rotem rotem  162 2009-07-05 18:19 compatibility.ini
> > -rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat
> > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions
> > -rw-r--r-- 1 rotem rotem  169 2009-07-01 16:10 localstore.rdf
> > -rw-r--r-- 1 rotem rotem  304 2009-07-05 18:39 mimeTypes.rdf
> > -rw-r--r-- 1 rotem rotem    0 2009-07-05 18:40 .parentlock
> > -rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite
> > -rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite
> > -rw------- 1 rotem rotem  951 2009-07-05 19:00 prefs.js
> > -rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl
> > -rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:19 xpti.dat
> > -rw-r--r-- 1 rotem rotem  98K 2009-07-05 18:20 XUL.mfasl
> >
> > ./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index:
> > total 2.4G
> > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 .
> > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
> > -rw-r--r-- 1 rotem rotem    0 2009-07-02 01:46 flintlock
> > -rw-r--r-- 1 rotem rotem   12 2009-07-02 01:46 iamflint
> > -rw-r--r-- 1 rotem rotem  22K 2009-07-02 05:13 position.baseA
> > -rw-r--r-- 1 rotem rotem  21K 2009-07-02 05:10 position.baseB
> > -rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB
> > -rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:13 postlist.baseA
> > -rw-r--r-- 1 rotem rotem  12K 2009-07-02 05:10 postlist.baseB
> > -rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB
> > -rw-r--r-- 1 rotem rotem   70 2009-07-02 05:13 record.baseA
> > -rw-r--r-- 1 rotem rotem   70 2009-07-02 05:10 record.baseB
> > -rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB
> > -rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA
> > -rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB
> > -rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB
> > -rw-r--r-- 1 rotem rotem  232 2009-07-02 05:13 value.baseA
> > -rw-r--r-- 1 rotem rotem  230 2009-07-02 05:10 value.baseB
> > -rw-r--r-- 1 rotem rotem  14M 2009-07-02 05:13 value.DB
> >
> > ./7680jxd5.default/extensions:
> > total 8.0K
> > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 .
> > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 ..
> > ro...@desktop:~/.www.kiwix.org/kiwix$ [5]
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Sun, 5 Jul 2009 20:57:39 +0200
> > From: Manuel Schneider
> > Subject: Re: [openZIM dev-l] Kiwix index size
> > To: [email protected], [email protected]
> > Message-ID:
> > Content-Type: text/plain;  charset="utf-8"
> >
> > Hi Asaf,
> >
> > Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov:
> > > When running Kiwixs indexer on the ZIM file I had created from the
> >
> > Hebrew
> >
> > > Wikipedia last week, the Kiwix data directory ran up to a total of
> >
> > 31
> >
> > > items, totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does
> >
> > this
> >
> > > proportion make sense?
> >
> > I am not sure about the other files which were created, you only
> > need the ZIM
> > file with the index itself.
> >
> > For 900000 articles the ZIM file containing the articles was 1.4 GB,
> > the
> > Index ZIM was 1.0 GB.
> >
> > So I think 300 MB looks fine.
> >
> > Greets,
> >
> > Manuel
> > --
> > Regards
> > Manuel Schneider
> >
> > Wikimedia CH - Verein zur F?rderung Freien Wissens
> > Wikimedia CH - Association for the advancement of free knowledge
> > www.wikimedia.ch [6]
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Sun, 05 Jul 2009 21:05:33 +0200
> > From: Emmanuel Engelhart
> > Subject: Re: [openZIM dev-l] Kiwix index size
> > To: [email protected], [email protected]
> > Message-ID:
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi Asaf
> >
> > Asaf Bartov a ?crit :
> > > When running Kiwixs indexer on the ZIM file I had created from the
> >
> > Hebrew
> >
> > > Wikipedia last week, the Kiwix data directory ran up to a total of
> >
> > 31 items,
> >
> > > totalling 2.3 GB.  The ZIM file itself is ~300MB.  Does this
> >
> > proportion make
> >
> > > sense?
> >
> > this is possible. Kiwix uses the Xapian search engine which
> > generates
> > pretty big index files.
> >
> > I have to questions:
> > * Are the search results OK?
> > * Do you have a problem with the size of the index? Do you have a
> > size
> > limit?
> >
> > They are many open search/index softwares. I choose to use Xapian
> > for
> > many reasons, but this is possible under certain condition to add to
> > Kiwix the support to an another search engine. This should be also
> > possible to make a modified version of the indexer using less disk
> > space
> > (but with less words indexed).
> >
> > OpenZIM itself provides a search solution, Tommi can explain you
> > more
> > about it. Maybe it would be interesting for you to test it and give
> > us a
> >  feedback!
> >
> > Regards
> > Emmanuel
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.9 (GNU/Linux)
> > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org [7]
> >
> > iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I
> > JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD
> > =RH/U
> > -----END PGP SIGNATURE-----
> >
> > ------------------------------
> >
> > _______________________________________________
> > dev-l mailing list
> > [email protected]
> > https://intern.openzim.org/mailman/listinfo/dev-l [8]
> >
> > End of dev-l Digest, Vol 5, Issue 2
> > ***********************************
> >
> > --
> > Rotem Simha
> >
> > _______________________________________________
> > dev-l mailing list
> > [email protected]
> > https://intern.openzim.org/mailman/listinfo/dev-l [9]
> >
> > --
> > --
> > Asaf Bartov
> >
> >
> >
> > Links:
> > ------
> > [1] http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg
> > [2] https://intern.openzim.org/mailman/listinfo/dev-l
> > [3]
> > http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/a
> >tt achment.html[4] http://www.kiwix.org/kiwix$
> > [5] http://www.kiwix.org/kiwix$
> > [6] http://www.wikimedia.ch
> > [7] http://enigmail.mozdev.org
> > [8] https://intern.openzim.org/mailman/listinfo/dev-l
> > [9] https://intern.openzim.org/mailman/listinfo/dev-l
>
> _______________________________________________
> dev-l mailing list
> [email protected]
> https://intern.openzim.org/mailman/listinfo/dev-l



-- 
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
_______________________________________________
dev-l mailing list
[email protected]
https://intern.openzim.org/mailman/listinfo/dev-l

Reply via email to