Discussing the future format of the search index is also part of the upcoming developers meeting.
I'd like to encourage you to participate. Vote for a weekend which will fit for you to have the developers meeting: http://www.doodle.com/xaf4tzpuwk2xf59h Greets, Manuel Am Montag, 6. Juli 2009 15:15:45 schrieb [email protected]: > I have answered to Rotem about the links. I have also open a bug on the > Kiwix side: > https://sourceforge.net/tracker/?func=detail&aid=2817440&group_id=175508&at >id=873515 > > For the search engine index size, we have to search a solution with a > smaller index. Starting with the openzim solution should be good. > I will have a look during this week. > > Emmanuel > > Le lun 06/07/09 15:03, "Asaf Bartov" [email protected] a écrit: > > Clarification: > > > > This last message was by Rotem, a fellow WM-IL member helping me with > > the embedding of the Hebrew Wikipedia in the One Computer Per Child > > project. > > > > He is reporting issues with Kiwix and the ZIM file I created last > > week. > > > > Regarding size: Size is important, because we intend to add images > > (the 300MB ZIM file is the complete Hebrew Wikipedia text, but no > > pictures). We are hoping to have at least 5GB reserved for us in > > those One Computer Per Child machines we are to install on, but we may > > be forced to make do with 3GB. So every MB saved from the index, is > > another MB available for images... > > > > Asaf Bartov > > Wikimedia Israel > > > > On Mon, Jul 6, 2009 at 3:58 PM, Rotem Simha wrote: > > * there are some errors in links of files and special pages > > examples > > קובץ:Nuvola_apps_important.svg [1] link to > > ויקיפדיה:מיזמי ויקיפדיה/מיזם ערכים > > ללא תמונות/קטגוריות/ספורטאים איטלקים > > (wikipedia:wikipedia projects articles without imagescategoriesSports > > people from Italy) > > מיוחד:אקראי (Special:Random) > 15 במאי (may 15) > > מיוחד:שינויים אחרונים (Special:RecentChanges) > > > 10_באוגוסט > > > > * size is important because we intend to add images > > > > 2009/7/6 > > Send dev-l mailing list submissions to > > [email protected] > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://intern.openzim.org/mailman/listinfo/dev-l [2] > > or, via email, send a message with subject or body help to > > [email protected] > > > > You can reach the person managing the list at > > [email protected] > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of dev-l digest..." > > > > Todays Topics: > > > > 1. Kiwix index size (Asaf Bartov) > > 2. Re: Kiwix index size (Manuel Schneider) > > 3. Re: Kiwix index size (Emmanuel Engelhart) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Sun, 5 Jul 2009 19:18:57 +0300 > > From: Asaf Bartov > > Subject: [openZIM dev-l] Kiwix index size > > To: [email protected] > > Message-ID: > > > > > > Content-Type: text/plain; charset="iso-8859-1" > > > > Hi, everyone. > > > > When running Kiwixs indexer on the ZIM file I had created from the > > Hebrew > > Wikipedia last week, the Kiwix data directory ran up to a total of > > 31 items, > > totalling 2.3 GB. The ZIM file itself is ~300MB. Does this > > proportion make > > sense? > > > > Detailed ls output attached. > > > > Thanks in advance, > > > > Asaf Bartov > > -- > > Asaf Bartov > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: > > tachment.html[3]> > > -------------- next part -------------- > > ro...@desktop:~/.www.kiwix.org/kiwix$ [4] ls -l -h -a -R > > .: > > total 16K > > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 . > > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. > > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 7680jxd5.default > > -rw-r--r-- 1 rotem rotem 94 2009-07-01 16:10 profiles.ini > > > > ./7680jxd5.default: > > total 1.7M > > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 . > > drwx------ 3 rotem rotem 4.0K 2009-07-01 16:10 .. > > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 > > 31c26198d06ad265677b450796cc09aa.index > > -rw------- 1 rotem rotem 162 2009-07-05 18:19 compatibility.ini > > -rw-r--r-- 1 rotem rotem 135K 2009-07-05 18:19 compreg.dat > > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 extensions > > -rw-r--r-- 1 rotem rotem 169 2009-07-01 16:10 localstore.rdf > > -rw-r--r-- 1 rotem rotem 304 2009-07-05 18:39 mimeTypes.rdf > > -rw-r--r-- 1 rotem rotem 0 2009-07-05 18:40 .parentlock > > -rw-r--r-- 1 rotem rotem 2.0K 2009-07-01 16:10 permissions.sqlite > > -rw-r--r-- 1 rotem rotem 128K 2009-07-05 18:54 places.sqlite > > -rw------- 1 rotem rotem 951 2009-07-05 19:00 prefs.js > > -rw-r--r-- 1 rotem rotem 1.1M 2009-07-05 18:20 XPC.mfasl > > -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:19 xpti.dat > > -rw-r--r-- 1 rotem rotem 98K 2009-07-05 18:20 XUL.mfasl > > > > ./7680jxd5.default/31c26198d06ad265677b450796cc09aa.index: > > total 2.4G > > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-02 05:13 . > > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. > > -rw-r--r-- 1 rotem rotem 0 2009-07-02 01:46 flintlock > > -rw-r--r-- 1 rotem rotem 12 2009-07-02 01:46 iamflint > > -rw-r--r-- 1 rotem rotem 22K 2009-07-02 05:13 position.baseA > > -rw-r--r-- 1 rotem rotem 21K 2009-07-02 05:10 position.baseB > > -rw-r--r-- 1 rotem rotem 1.4G 2009-07-02 05:13 position.DB > > -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:13 postlist.baseA > > -rw-r--r-- 1 rotem rotem 12K 2009-07-02 05:10 postlist.baseB > > -rw-r--r-- 1 rotem rotem 754M 2009-07-02 05:13 postlist.DB > > -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:13 record.baseA > > -rw-r--r-- 1 rotem rotem 70 2009-07-02 05:10 record.baseB > > -rw-r--r-- 1 rotem rotem 3.3M 2009-07-02 05:13 record.DB > > -rw-r--r-- 1 rotem rotem 4.4K 2009-07-02 05:13 termlist.baseA > > -rw-r--r-- 1 rotem rotem 4.3K 2009-07-02 05:10 termlist.baseB > > -rw-r--r-- 1 rotem rotem 278M 2009-07-02 05:13 termlist.DB > > -rw-r--r-- 1 rotem rotem 232 2009-07-02 05:13 value.baseA > > -rw-r--r-- 1 rotem rotem 230 2009-07-02 05:10 value.baseB > > -rw-r--r-- 1 rotem rotem 14M 2009-07-02 05:13 value.DB > > > > ./7680jxd5.default/extensions: > > total 8.0K > > drwxr-xr-x 2 rotem rotem 4.0K 2009-07-01 16:10 . > > drwx------ 4 rotem rotem 4.0K 2009-07-05 19:00 .. > > ro...@desktop:~/.www.kiwix.org/kiwix$ [5] > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 5 Jul 2009 20:57:39 +0200 > > From: Manuel Schneider > > Subject: Re: [openZIM dev-l] Kiwix index size > > To: [email protected], [email protected] > > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > > > Hi Asaf, > > > > Am Sonntag, 5. Juli 2009 schrieb Asaf Bartov: > > > When running Kiwixs indexer on the ZIM file I had created from the > > > > Hebrew > > > > > Wikipedia last week, the Kiwix data directory ran up to a total of > > > > 31 > > > > > items, totalling 2.3 GB. The ZIM file itself is ~300MB. Does > > > > this > > > > > proportion make sense? > > > > I am not sure about the other files which were created, you only > > need the ZIM > > file with the index itself. > > > > For 900000 articles the ZIM file containing the articles was 1.4 GB, > > the > > Index ZIM was 1.0 GB. > > > > So I think 300 MB looks fine. > > > > Greets, > > > > Manuel > > -- > > Regards > > Manuel Schneider > > > > Wikimedia CH - Verein zur F?rderung Freien Wissens > > Wikimedia CH - Association for the advancement of free knowledge > > www.wikimedia.ch [6] > > > > ------------------------------ > > > > Message: 3 > > Date: Sun, 05 Jul 2009 21:05:33 +0200 > > From: Emmanuel Engelhart > > Subject: Re: [openZIM dev-l] Kiwix index size > > To: [email protected], [email protected] > > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi Asaf > > > > Asaf Bartov a ?crit : > > > When running Kiwixs indexer on the ZIM file I had created from the > > > > Hebrew > > > > > Wikipedia last week, the Kiwix data directory ran up to a total of > > > > 31 items, > > > > > totalling 2.3 GB. The ZIM file itself is ~300MB. Does this > > > > proportion make > > > > > sense? > > > > this is possible. Kiwix uses the Xapian search engine which > > generates > > pretty big index files. > > > > I have to questions: > > * Are the search results OK? > > * Do you have a problem with the size of the index? Do you have a > > size > > limit? > > > > They are many open search/index softwares. I choose to use Xapian > > for > > many reasons, but this is possible under certain condition to add to > > Kiwix the support to an another search engine. This should be also > > possible to make a modified version of the indexer using less disk > > space > > (but with less words indexed). > > > > OpenZIM itself provides a search solution, Tommi can explain you > > more > > about it. Maybe it would be interesting for you to test it and give > > us a > > feedback! > > > > Regards > > Emmanuel > > -----BEGIN PGP SIGNATURE----- > > Version: GnuPG v1.4.9 (GNU/Linux) > > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org [7] > > > > iEYEARECAAYFAkpQ+XcACgkQn3IpJRpNWtPm8wCfcmzwRfg6/9ttuknkURF7ct5I > > JLAAoLbVJWqXUKIeh8Mpua3GD+bjI5ZD > > =RH/U > > -----END PGP SIGNATURE----- > > > > ------------------------------ > > > > _______________________________________________ > > dev-l mailing list > > [email protected] > > https://intern.openzim.org/mailman/listinfo/dev-l [8] > > > > End of dev-l Digest, Vol 5, Issue 2 > > *********************************** > > > > -- > > Rotem Simha > > > > _______________________________________________ > > dev-l mailing list > > [email protected] > > https://intern.openzim.org/mailman/listinfo/dev-l [9] > > > > -- > > -- > > Asaf Bartov > > > > > > > > Links: > > ------ > > [1] http://commons.wikimedia.org/wiki/File:Nuvola_apps_important.svg > > [2] https://intern.openzim.org/mailman/listinfo/dev-l > > [3] > > http://intern.openzim.org/pipermail/dev-l/attachments/20090705/2afee878/a > >tt achment.html[4] http://www.kiwix.org/kiwix$ > > [5] http://www.kiwix.org/kiwix$ > > [6] http://www.wikimedia.ch > > [7] http://enigmail.mozdev.org > > [8] https://intern.openzim.org/mailman/listinfo/dev-l > > [9] https://intern.openzim.org/mailman/listinfo/dev-l > > _______________________________________________ > dev-l mailing list > [email protected] > https://intern.openzim.org/mailman/listinfo/dev-l -- Regards Manuel Schneider Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch _______________________________________________ dev-l mailing list [email protected] https://intern.openzim.org/mailman/listinfo/dev-l
