RE: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl!
Congrats Jan! Uwe - Uwe Schindler Achterdiek 19, D-28357 Bremen <https://www.thetaphi.de> https://www.thetaphi.de eMail: u...@thetaphi.de From: Anshum Gupta Sent: Thursday, February 18, 2021 7:55 PM To: Lucene Dev ; solr-user@lucene.apache.org Subject: Congratulations to the new Apache Solr PMC Chair, Jan Høydahl! Hi everyone, I’d like to inform everyone that the newly formed Apache Solr PMC nominated and elected Jan Høydahl for the position of the Solr PMC Chair and Vice President. This decision was approved by the board in its February 2021 meeting. Congratulations Jan! -- Anshum Gupta
[SECURITY] CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext)
CVE-2018-8026: XXE vulnerability due to Apache Solr configset upload (exchange rate provider config / enum field config / TIKA parsecontext) Severity: High Vendor: The Apache Software Foundation Versions Affected: Solr 6.0.0 to 6.6.4 Solr 7.0.0 to 7.3.1 Description: The details of this vulnerability were reported by mail to the Apache security mailing list. This vulnerability relates to an XML external entity expansion (XXE) in Solr config files (currency.xml, enumsConfig.xml referred from schema.xml, TIKA parsecontext config file). In addition, Xinclude functionality provided in these config files is also affected in a similar way. The vulnerability can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. The manipulated files can be uploaded as configsets using Solr's API, allowing to exploit that vulnerability. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.5 or Solr 7.4.0 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases only allow external entities and Xincludes that refer to local files / zookeeper resources below the Solr instance directory (using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in mind, that external entities and XInclude are explicitly supported to better structure config files in large installations. Before Solr 6 this was no problem, as config files were not accessible through the APIs. If users are unable to upgrade to Solr 6.6.5 or Solr 7.4.0 then they are advised to make sure that Solr instances are only used locally without access to public internet, so the vulnerability cannot be exploited. In addition, reverse proxies should be guarded to not allow end users to reach the configset APIs. Please refer to [2] on how to correctly secure Solr servers. Solr 5.x and earlier are not affected by this vulnerability; those versions do not allow to upload configsets via the API. Nevertheless, users should upgrade those versions as soon as possible, because there may be other ways to inject config files through file upload functionality of the old web interface. Those versions are no longer maintained, so no deep analysis was done. Credit: Yuyang Xiao, Ishan Chattopadhyaya References: [1] https://issues.apache.org/jira/browse/SOLR-12450 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
[SECURITY] CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload
CVE-2018-8010: XXE vulnerability due to Apache Solr configset upload Severity: High Vendor: The Apache Software Foundation Versions Affected: Solr 6.0.0 to 6.6.3 Solr 7.0.0 to 7.3.0 Description: The details of this vulnerability were reported internally by one of Apache Solr's committers. This vulnerability relates to an XML external entity expansion (XXE) in Solr config files (solrconfig.xml, schema.xml, managed-schema). In addition, Xinclude functionality provided in these config files is also affected in a similar way. The vulnerability can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.4 or Solr 7.3.1 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases only allow external entities and Xincludes that refer to local files / zookeeper resources below the Solr instance directory (using Solr's ResourceLoader); usage of absolute URLs is denied. Keep in mind, that external entities and XInclude are explicitly supported to better structure config files in large installations. Before Solr 6 this was no problem, as config files were not accessible through the APIs. If users are unable to upgrade to Solr 6.6.4 or Solr 7.3.1 then they are advised to make sure that Solr instances are only used locally without access to public internet, so the vulnerability cannot be exploited. In addition, reverse proxies should be guarded to not allow end users to reach the configset APIs. Please refer to [2] on how to correctly secure Solr servers. Solr 5.x and earlier are not affected by this vulnerability; those versions do not allow to upload configsets via the API. Nevertheless, users should upgrade those versions as soon as possible, because there may be other ways to inject config files through file upload functionality of the old web interface. Those versions are no longer maintained, so no deep analysis was done. Credit: Ananthesh, Ishan Chattopadhyaya References: [1] https://issues.apache.org/jira/browse/SOLR-12316 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
[SECURITY] CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter
CVE-2018-1308: XXE attack through Apache Solr's DIH's dataConfig request parameter Severity: Major Vendor: The Apache Software Foundation Versions Affected: Solr 1.2 to 6.6.2 Solr 7.0.0 to 7.2.1 Description: The details of this vulnerability were reported to the Apache Security mailing list. This vulnerability relates to an XML external entity expansion (XXE) in the `&dataConfig=` parameter of Solr's DataImportHandler. It can be used as XXE using file/ftp/http protocols in order to read arbitrary local files from the Solr server or the internal network. See [1] for more details. Mitigation: Users are advised to upgrade to either Solr 6.6.3 or Solr 7.3.0 releases both of which address the vulnerability. Once upgrade is complete, no other steps are required. Those releases disable external entities in anonymous XML files passed through this request parameter. If users are unable to upgrade to Solr 6.6.3 or Solr 7.3.0 then they are advised to disable data import handler in their solrconfig.xml file and restart their Solr instances. Alternatively, if Solr instances are only used locally without access to public internet, the vulnerability cannot be used directly, so it may not be required to update, and instead reverse proxies or Solr client applications should be guarded to not allow end users to inject `dataConfig` request parameters. Please refer to [2] on how to correctly secure Solr servers. Credit: 麦 香浓郁 References: [1] https://issues.apache.org/jira/browse/SOLR-11971 [2] https://wiki.apache.org/solr/SolrSecurity - Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
FOSS Backstage Micro Summit on Monday in Berlin
Hi, It's already a bit late, but all people who are visiting Germany next week and want to do a short trip to Berlin: There are still slots free on the FOSS Backstage Micro Summit. It is a mini conference conference on everything related to governance, collaboration, legal and economics within the scope of FOSS. The main event will take place as part of berlinbuzzwords 2018. We have a lot of speakers invited - also from ASF! https://www.foss-backstage.de/ Program: https://www.foss-backstage.de/news/micro-summit-program-online-now I hope to see you there, Uwe ----- Uwe Schindler uschind...@apache.org ASF Member, Apache Lucene PMC / Committer Bremen, Germany http://lucene.apache.org/
RE: TIKA OCR not working
Hi, TIKA OCR is definitely working automatically with Solr 5.x. It is just important to install TesseractOCR on path (which is a native tool that does the actual work). On Ubuntu Linux, this should be quite simple ("apt-get install tesseract-ocr" or like that). You may also need to ainstall additional language for better results. Unless you are on a Turkish localized machine (which causes a bug in the JDK on spawning external processes) and the native tools are installed, it should work OOB, no configuration needed. Please also check log files. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Allison, Timothy B. [mailto:talli...@mitre.org] > Sent: Monday, April 27, 2015 4:27 PM > To: u...@tika.apache.org > Cc: trung...@anlab.vn; solr-user@lucene.apache.org > Subject: FW: TIKA OCR not working > > Trung, > > I haven't experimented with our OCR parser yet, but this should give a good > start: https://wiki.apache.org/tika/TikaOCR . > > Have you installed tesseract? > > Tika colleagues, > Any other tips? What else has to be configured and how? > > -Original Message- > From: trung.ht [mailto:trung...@anlab.vn] > Sent: Friday, April 24, 2015 11:22 PM > To: solr-user@lucene.apache.org > Subject: Re: TIKA OCR not working > > HI everyone, > > Does anyone have the answer for this problem :)? > > > I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika 1.7, > > but it looks like it does not work. Does anyone know that TIKA OCR > > works automatically with Solr or I have to change some settings? > > > >> > Trung. > > > > It's not clear if OCR would happen automatically in Solr Cell, or if > >> changes to Solr would be needed. > >> > >> For Tika OCR info, see: > >> > >> https://issues.apache.org/jira/browse/TIKA-93 > >> https://wiki.apache.org/tika/TikaOCR > >> > >> > >> > >> -- Jack Krupansky > >> > >> On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch < > >> arafa...@gmail.com> > >> wrote: > >> > >> > I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't > >> > seen > >> it > >> > in use yet. > >> > > >> > Regards, > >> > Alex > >> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan" > >> wrote: > >> > > >> > > Hi Trung, > >> > > > >> > > I didn't know about OCR capabilities of tika. > >> > > Someone who is familiar with sold-cell can inform us whether this > >> > > functionality is added to solr or not. > >> > > > >> > > Ahmet > >> > > > >> > > > >> > > > >> > > On Thursday, April 23, 2015 2:06 PM, trung.ht > >> wrote: > >> > > Hi Ahmet, > >> > > > >> > > I used a png file, not a pdf file. From the document, I > >> > > understand > >> that > >> > > solr will post the file to tika, and since tika 1.7, OCR is included. > >> Is > >> > > there something I misunderstood. > >> > > > >> > > Trung. > >> > > > >> > > > >> > > On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan > >> >> > > > >> > > wrote: > >> > > > >> > > > Hi Trung, > >> > > > > >> > > > solr-cell (tika) does not do OCR. It cannot exact text from > >> > > > image > >> based > >> > > > pdfs. > >> > > > > >> > > > Ahmet > >> > > > > >> > > > > >> > > > > >> > > > On Thursday, April 23, 2015 7:33 AM, trung.ht > >> > > > > >> > wrote: > >> > > > > >> > > > > >> > > > > >> > > > Hi, > >> > > > > >> > > > I want to use solr to index some scanned document, after > >> > > > settings > >> solr > >> > > > document with a two field "content" and "filename", I tried to > >> upload > >> > the > >> > > > attached file, but it seems that the content of the file is > >> > > > only > >> "\n \n > >> > > > \n".
RE: TIKA OCR not working
Yes that is fixed. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] > Sent: Monday, April 27, 2015 4:29 PM > To: u...@tika.apache.org > Cc: trung...@anlab.vn; solr-user@lucene.apache.org > Subject: Re: TIKA OCR not working > > It should work out of the box in Solr as long as Tesseract is installed and on > the class path. Solr had an issue with it since Tika sends 2 startDocument > calls, > but I fixed that with Uwe and it was shipped in 4.10.4 and in 5.x I think? > > ++ > > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) NASA Jet > Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++ > > Adjunct Associate Professor, Computer Science Department University of > Southern California, Los Angeles, CA 90089 USA > ++ > > > > > > > > -Original Message- > From: , "Timothy B." > Reply-To: "u...@tika.apache.org" > Date: Monday, April 27, 2015 at 10:26 AM > To: "u...@tika.apache.org" > Cc: "trung...@anlab.vn" , "solr- > u...@lucene.apache.org" > > Subject: FW: TIKA OCR not working > > >Trung, > > > >I haven't experimented with our OCR parser yet, but this should give a > >good start: https://wiki.apache.org/tika/TikaOCR . > > > >Have you installed tesseract? > > > >Tika colleagues, > > Any other tips? What else has to be configured and how? > > > >-Original Message- > >From: trung.ht [mailto:trung...@anlab.vn] > >Sent: Friday, April 24, 2015 11:22 PM > >To: solr-user@lucene.apache.org > >Subject: Re: TIKA OCR not working > > > >HI everyone, > > > >Does anyone have the answer for this problem :)? > > > > > >I saw the document of Tika. Tika 1.7 support OCR and Solr 5.0 use Tika > >1.7, > >> but it looks like it does not work. Does anyone know that TIKA OCR > >> works automatically with Solr or I have to change some settings? > >> > >>> > >Trung. > > > > > >> It's not clear if OCR would happen automatically in Solr Cell, or if > >>> changes to Solr would be needed. > >>> > >>> For Tika OCR info, see: > >>> > >>> https://issues.apache.org/jira/browse/TIKA-93 > >>> https://wiki.apache.org/tika/TikaOCR > >>> > >>> > >>> > >>> -- Jack Krupansky > >>> > >>> On Thu, Apr 23, 2015 at 9:14 AM, Alexandre Rafalovitch < > >>> arafa...@gmail.com> > >>> wrote: > >>> > >>> > I think OCR is in Tika 1.8, so might be in Solr 5.?. But I haven't > >>>seen > >>> it > >>> > in use yet. > >>> > > >>> > Regards, > >>> > Alex > >>> > On 23 Apr 2015 10:24 pm, "Ahmet Arslan" > >>> > > >>> wrote: > >>> > > >>> > > Hi Trung, > >>> > > > >>> > > I didn't know about OCR capabilities of tika. > >>> > > Someone who is familiar with sold-cell can inform us whether > >>> > > this functionality is added to solr or not. > >>> > > > >>> > > Ahmet > >>> > > > >>> > > > >>> > > > >>> > > On Thursday, April 23, 2015 2:06 PM, trung.ht > >>> > > > >>> wrote: > >>> > > Hi Ahmet, > >>> > > > >>> > > I used a png file, not a pdf file. From the document, I > >>> > > understand > >>> that > >>> > > solr will post the file to tika, and since tika 1.7, OCR is > >>>included. > >>> Is > >>> > > there something I misunderstood. > >>> > > > >>> > > Trung. > >>> > > > >>> > > > >>> > > On Thu, Apr 23, 2015 at 5:59 PM, Ahmet Arslan > >>> >>> > > > >>> > > wrote: > >>> >
ApacheCon NA 2015 in Austin, Texas
Dear Apache Lucene/Solr enthusiast, In just a few weeks, we'll be holding ApacheCon in Austin, Texas, and we'd love to have you in attendance. You can save $300 on admission by registering NOW, since the early bird price ends on the 21st. Register at http://s.apache.org/acna2015-reg ApacheCon this year celebrates the 20th birthday of the Apache HTTP Server, and we'll have Brian Behlendorf, who started this whole thing, keynoting for us, and you'll have a chance to meet some of the original Apache Group, who will be there to celebrate with us. We also have talks about Apache Lucene and Apache Solr in 7 tracks of great talks, as well as BOFs, the Apache BarCamp, project-specific hack events, and evening events where you can deepen your connection with the larger Apache community. See the full schedule at http://apacheconna2015.sched.org/ And if you have any questions, comments, or just want to hang out with us before and during the event, follow us on Twitter - @apachecon - or drop by #apachecon on the Freenode IRC network. Hope to see you in Austin! - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: Reminder: FOSDEM 2015 - Open Source Search Dev Room
Hello everyone, We have extended the deadline for submissions to the FOSDEM 2015 Open Source Search Dev Room to Monday, 9 December at 23:59 CET. We are looking forward to your talk proposal! Cheers, Uwe - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/ > -Original Message- > From: Uwe Schindler [mailto:uschind...@apache.org] > Sent: Monday, November 24, 2014 9:33 AM > To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr- > u...@lucene.apache.org; gene...@lucene.apache.org > Subject: Reminder: FOSDEM 2015 - Open Source Search Dev Room > > Hi, > > We host a Dev-Room about "Open Source Search" on this year's FOSDEM > 2015 (https://fosdem.org/2015/), taking place on January 31th and February > 1st, 2015, in Brussels, Belgium. There is still one more week to submit your > talks, so hurry up and submit your talk early! > > Here is the full CFP as posted a few weeks ago: > > Search has evolved to be much more than simply full-text search. We now > rely on “search engines” for a wide variety of functionality: > search as navigation, search as analytics and backend for data visualization > and sometimes, dare we say it, as a data store. The purpose of this dev room > is to explore the new world of open source search engines: their enhanced > functionality, new use cases, feature and architectural deep dives, and the > position of search in relation to the wider set of software tools. > > We welcome proposals from folks working with or on open source search > engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) > or technologies that heavily depend upon search (e.g. > NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in > presentations on search algorithms, machine learning, real-world > implementation/deployment stories and explorations of the future of > search. > > Talks should be 30-60 minutes in length, including time for Q&A. > > You can submit your talks to us here: > https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 > 8G0OxSfp84A/viewform > > Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We > cannot guarantee we will have the opportunity to review submissions made > after the deadline, so please submit early (and often)! > > Should you have any questions, you can contact the Dev Room > organizers: opensourcesearch-devr...@lists.fosdem.org > > Cheers, > LH on behalf of the Open Source Search Dev Room Program Committee* > > * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten > Curdt, Uwe Schindler > > - > Uwe Schindler > uschind...@apache.org > Apache Lucene PMC Member / Committer > Bremen, Germany > http://lucene.apache.org/ > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org
Reminder: FOSDEM 2015 - Open Source Search Dev Room
Hi, We host a Dev-Room about "Open Source Search" on this year's FOSDEM 2015 (https://fosdem.org/2015/), taking place on January 31th and February 1st, 2015, in Brussels, Belgium. There is still one more week to submit your talks, so hurry up and submit your talk early! Here is the full CFP as posted a few weeks ago: Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for Q&A. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: FOSDEM 2015 - Open Source Search Dev Room
Hi, forgot to mention: FOSDEM 2015 takes place in Brussels on January 31th and February 1st, 2015. See also: https://fosdem.org/2015/ I hope to see you there! Uwe > -Original Message- > From: Uwe Schindler [mailto:uschind...@apache.org] > Sent: Monday, November 03, 2014 1:29 PM > To: d...@lucene.apache.org; java-u...@lucene.apache.org; solr- > u...@lucene.apache.org; gene...@lucene.apache.org > Subject: CFP: FOSDEM 2015 - Open Source Search Dev Room > > ***Please forward this CFP to anyone who may be interested in > participating.*** > > Hi, > > Search has evolved to be much more than simply full-text search. We now > rely on “search engines” for a wide variety of functionality: > search as navigation, search as analytics and backend for data visualization > and sometimes, dare we say it, as a data store. The purpose of this dev room > is to explore the new world of open source search engines: their enhanced > functionality, new use cases, feature and architectural deep dives, and the > position of search in relation to the wider set of software tools. > > We welcome proposals from folks working with or on open source search > engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) > or technologies that heavily depend upon search (e.g. > NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in > presentations on search algorithms, machine learning, real-world > implementation/deployment stories and explorations of the future of > search. > > Talks should be 30-60 minutes in length, including time for Q&A. > > You can submit your talks to us here: > https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V3 > 8G0OxSfp84A/viewform > > Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We > cannot guarantee we will have the opportunity to review submissions made > after the deadline, so please submit early (and often)! > > Should you have any questions, you can contact the Dev Room > organizers: opensourcesearch-devr...@lists.fosdem.org > > Cheers, > LH on behalf of the Open Source Search Dev Room Program Committee* > > * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten > Curdt, Uwe Schindler > > - > Uwe Schindler > uschind...@apache.org > Apache Lucene PMC Member / Committer > Bremen, Germany > http://lucene.apache.org/ > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org
CFP: FOSDEM 2015 - Open Source Search Dev Room
***Please forward this CFP to anyone who may be interested in participating.*** Hi, Search has evolved to be much more than simply full-text search. We now rely on “search engines” for a wide variety of functionality: search as navigation, search as analytics and backend for data visualization and sometimes, dare we say it, as a data store. The purpose of this dev room is to explore the new world of open source search engines: their enhanced functionality, new use cases, feature and architectural deep dives, and the position of search in relation to the wider set of software tools. We welcome proposals from folks working with or on open source search engines (e.g. Apache Lucene, Apache Solr, Elasticsearch, Seeks, Sphinx, etc.) or technologies that heavily depend upon search (e.g. NoSQL databases, Nutch, Apache Hadoop). We are particularly interested in presentations on search algorithms, machine learning, real-world implementation/deployment stories and explorations of the future of search. Talks should be 30-60 minutes in length, including time for Q&A. You can submit your talks to us here: https://docs.google.com/forms/d/11yLMj9ZlRD1EMU3Knp5y6eO3H5BRK7V38G0OxSfp84A/viewform Our Call for Papers will close at 23:59 CEST on Monday, December 1, 2014. We cannot guarantee we will have the opportunity to review submissions made after the deadline, so please submit early (and often)! Should you have any questions, you can contact the Dev Room organizers: opensourcesearch-devr...@lists.fosdem.org Cheers, LH on behalf of the Open Source Search Dev Room Program Committee* * Boaz Leskes, Isabel Drost-Fromm, Leslie Hawthorn, Ted Dunning, Torsten Curdt, Uwe Schindler ----- Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations
Hallo Apache Solr Users, the Apache Lucene PMC wants to make the users of Solr aware of the following issue: Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its binary release tarball. This version (and all previous ones) of Apache POI are vulnerable to the following issues: = CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML parser = Type: Information disclosure Description: Apache POI uses Java's XML components to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML External Entity (XXE) attacks, which allows remote attackers to bypass security restrictions and read arbitrary files via a crafted OpenXML document that provides an XML external entity declaration in conjunction with an entity reference. = CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML parser = Type: Denial of service Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse OpenXML files produced by Microsoft Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML Entity Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume large amounts of CPU resources. The Apache POI PMC released a bugfix version (3.10.1) today. Solr users are affected by these issues, if they enable the "Apache Solr Content Extraction Library (Solr Cell)" contrib module from the folder "contrib/extraction" of the release tarball. Users of Apache Solr are strongly advised to keep the module disabled if they don't use it. Alternatively, users of Apache Solr 4.8.0, 4.8.1, or 4.9.0 can update the affected libraries by replacing the vulnerable JAR files in the distribution folder. Users of previous versions have to update their Solr release first, patching older versions is impossible. To replace the vulnerable JAR files follow these steps: - Download the Apache POI 3.10.1 binary release: http://poi.apache.org/download.html#POI-3.10.1 - Unzip the archive - Delete the following files in your "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10-beta2.jar # poi-ooxml-3.10-beta2.jar # poi-ooxml-schemas-3.10-beta2.jar # poi-scratchpad-3.10-beta2.jar # xmlbeans-2.3.0.jar - Copy the following files from the base folder of the Apache POI distribution to the "solr-4.X.X/contrib/extraction/lib" folder: # poi-3.10.1-20140818.jar # poi-ooxml-3.10.1-20140818.jar # poi-ooxml-schemas-3.10.1-20140818.jar # poi-scratchpad-3.10.1-20140818.jar - Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the "solr-4.X.X/contrib/extraction/lib" folder. - Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any files with version number "3.10-beta2". - Verify that the folder contains one xmlbeans JAR file with version 2.6.0. If you just want to disable extraction of Microsoft Office documents, delete the files above and don't replace them. "Solr Cell" will automatically detect this and disable Microsoft Office document extraction. Coming versions of Apache Solr will have the updated libraries bundled. Happy Searching and Extracting, The Apache Lucene Developers PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting these issues! - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
[ANNOUNCE] Apache Solr 4.8.0 released
28 April 2014, Apache Solr™ 4.8.0 available The Lucene PMC is pleased to announce the release of Apache Solr 4.8.0 Solr is the popular, blazing fast, open source NoSQL search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing fault tolerant distributed search and indexing, and powers the search and navigation features of many of the world's largest internet sites. Solr 4.8.0 is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-latest-redir.html See the CHANGES.txt file included with the release for a full list of details. Solr 4.8.0 Release Highlights: * Apache Solr now requires Java 7 or greater (recommended is Oracle Java 7 or OpenJDK 7, minimum update 55; earlier versions have known JVM bugs affecting Solr). * Apache Solr is fully compatible with Java 8. * and tags have been deprecated from schema.xml. There is no longer any reason to keep them in the schema file, they may be safely removed. This allows intermixing of , and definitions if desired. * The new {!complexphrase} query parser supports wildcards, ORs etc. inside Phrase Queries. * New Collections API CLUSTERSTATUS action reports the status of collections, shards, and replicas, and also lists collection aliases and cluster properties. * Added managed synonym and stopword filter factories, which enable synonym and stopword lists to be dynamically managed via REST API. * JSON updates now support nested child documents, enabling {!child} and {!parent} block join queries. * Added ExpandComponent to expand results collapsed by the CollapsingQParserPlugin, as well as the parent/child relationship of nested child documents. * Long-running Collections API tasks can now be executed asynchronously; the new REQUESTSTATUS action provides status. * Added a hl.qparser parameter to allow you to define a query parser for hl.q highlight queries. * In Solr single-node mode, cores can now be created using named configsets. * New DocExpirationUpdateProcessorFactory supports computing an expiration date for documents from the "TTL" expression, as well as automatically deleting expired documents on a periodic basis. Solr 4.8.0 also includes many other new features as well as numerous optimizations and bugfixes of the corresponding Apache Lucene release. Please report any feedback to the mailing lists (http://lucene.apache.org/solr/discussion.html) Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. - Uwe Schindler uschind...@apache.org Apache Lucene PMC Chair / Committer Bremen, Germany http://lucene.apache.org/
Attention: Lucene 4.8 and Solr 4.8 will require minimum Java 7
Hi, the Apache Lucene/Solr committers decided with a large majority on the vote to require Java 7 for the next minor release of Apache Lucene and Apache Solr (version 4.8)! Support for Java 6 by Oracle already ended more than a year ago and Java 8 is coming out in a few days. The next release will also contain some improvements for Java 7: - Better file handling (especially on Windows) in the directory implementations. Files can now be deleted on windows, although the index is still open - like it was always possible on Unix environments (delete on last close semantics). - Speed improvements in sorting comparators: Sorting now uses Java 7's own comparators for integer and long sorts, which are highly optimized by the Hotspot VM.. If you want to stay up-to-date with Lucene and Solr, you should upgrade your infrastructure to Java 7. Please be aware that you must use at least use Java 7u1. The recommended version at the moment is Java 7u25. Later versions like 7u40, 7u45,... have a bug causing index corrumption. Ideally use the Java 7u60 prerelease, which has fixed this bug. Once 7u60 is out, this will be the recommended version. In addition, there is no Oracle/BEA JRockit available for Java 7, use the official Oracle Java 7. JRockit was never working correctly with Lucene/Solr (causing index corrumption), so this should not be an issue for you. Please also review our list of JVM bugs: http://wiki.apache.org/lucene-java/JavaBugs Apache Lucene and Apache Solr were also heavily tested with all prerelease versions of Java 8, so you can also give it a try! Looking forward to the official Java 8 release next week - I will run my indexes with that version for sure! Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de
RE: solr bug feedback
This is already fixed in Solr 4.1! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de <http://www.thetaphi.de/> eMail: u...@thetaphi.de From: 虛客 [mailto:itemdet...@qq.com] Sent: Wednesday, February 20, 2013 11:17 AM To: solr-user Subject: solr bug feedback solr: 3.6.1 ---> Class: SolrRequestParsers --->line: 75 hava a manual mistake: “long uploadLimitKB = 1048; // 2MB default” should to “long uploadLimitKB = 2048; // 2MB default”。 thinks for open source!!!
RE: How to setup SimpleFSDirectoryFactory
Hi Geetha Anjali, Lucene will not use MMapDirectoy by default on 32 bit platforms or if you are not using a Oracle/Sun JVM. On 64 bit platforms, Lucene will use it, but will accept the risks of segfaulting when unmapping the buffers - Lucene does try its best to prevent this. It is a risk, but accepted by the Lucene developers. To come back to your issue: It is perfectly fine on Solr/Lucene to not unmap all buffers as long as the index is open. The number of open file handles is another discussion, but not related at all to MMap, if you are using an old Lucene version (like 3.0.2), you should upgrade in all cases The recent one is 3.6.1. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: geetha anjali [mailto:anjaliprabh...@gmail.com] > Sent: Monday, July 23, 2012 4:28 AM > Subject: Re: How to setup SimpleFSDirectoryFactory > > Hu Uwe, > Thanks Wwe, Have you checked the Bug in JRE for mmapDirectory?. I was > mentioning this, This is posted in Oracle site, and the API doc. > They accept this as a bug, have you seen this?. > > "MMapDirectory<http://lucene.apache.org/java/3_0_2/api/core/org/apache/l > u=ene/store/MMapDirectory.html>uses > memory-mapped IO when reading. This is a good choice if you have plenty of > virtual memory relative to your index size, eg if you are running on a 64 bit JRE, > or you are running on a 32 bit JRE but your index sizes are small enough to fit > into the virtual memory space. Java has currently the limitation of not being > able to unmap files from user code. The files are unmapped, when GC releases > the byte buffers. *Due to this > bug<http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038>in > Sun's JRE, > MMapDirectory's > **IndexInput.close()*<http://lucene.apache.org/java/3_0_2/api/core/org/apac > =e/lucene/store/IndexInput.html#close%28%29> > * is unable to close the underlying OS file handle. Only when GC finally collects > the underlying objects, which could be quite some time later, will the file > handle be closed*. *This will consume additional transient disk > usage*: on Windows, attempts to delete or overwrite the files will result in an > exception; on other platforms, which typically have a "delete on last close" > semantics, while such operations will succeed, the bytes are still consuming > space on disk. For many applications this limitation is not a problem (e.g. if you > have plenty of disk space, and you don't rely on overwriting files on Windows) > but it's still an important limitation to be aware of. This class supplies a > (possibly dangerous) workaround mentioned in the bug report, which may fail > on non-Sun JVMs. " > > > Thanks, > > > On Mon, Jul 23, 2012 at 4:13 AM, Uwe Schindler wrote: > > > It is hopeless to talk to both of you, you don't understand virtual memor=: > > > > > I get a similar situation using Windows 2008 and Solr 3.6. Memory > > > using mmap=is never released. Even if I turn off traffic and commit > > > and do = > > manual > > > gc= If the size of the index is 3gb then memory used will be heap + > > > 3=b > > of > > > sha=ed used. If I use a 6gb index I get heap + 6gb. > > > > That is expected, but we are talking not about allocated physical > > memory, we are talking about allocated ADDRESS SPACE and you have 2^47 > > of that on 64bit platforms. There is no physical memory wasted or > > allocated - please read the blog post a third, forth, fifth... or > > tenth time, until it is obvious. Yo= should also go back to school and > > take a course on system programming and operating system kernels. > > Every CS student gets that taught in his first year (at least in > > Germany). > > > > Java's GC has nothing to do with that - as long as the index is open, > > ADDRESS SPACE is assigned. We are talking not about memory nor Java > > heap space. > > > > > If I turn off > > > MMapDirectory=actory it goes back down. When is the MMap supposed to > > > release memory ? It o=ly does it on JVM restart now. > > > > Can you please stop spreading nonsense about MMapDirectory with no > > knowledge behind? http://www.linuxatemyram.com/ - Also applies to > > Windows. > > > > Uwe > > > > > Bill Bell > > > Sent from mobile > > > > > > > > > On Jul 22, 2012, at 6:21 AM, geetha anjali > > > wrote:= > > > > It happens in 3.6, for this reasons I thought of moving to solandra. > > > > If I do a commit, the all documents are persisted with out any >
RE: How to setup SimpleFSDirectoryFactory
It is hopeless to talk to both of you, you don't understand virtual memory: > I get a similar situation using Windows 2008 and Solr 3.6. Memory using > mmap=is never released. Even if I turn off traffic and commit and do a manual > gc= If the size of the index is 3gb then memory used will be heap + 3gb of > sha=ed used. If I use a 6gb index I get heap + 6gb. That is expected, but we are talking not about allocated physical memory, we are talking about allocated ADDRESS SPACE and you have 2^47 of that on 64bit platforms. There is no physical memory wasted or allocated - please read the blog post a third, forth, fifth... or tenth time, until it is obvious. You should also go back to school and take a course on system programming and operating system kernels. Every CS student gets that taught in his first year (at least in Germany). Java's GC has nothing to do with that - as long as the index is open, ADDRESS SPACE is assigned. We are talking not about memory nor Java heap space. > If I turn off > MMapDirectory=actory it goes back down. When is the MMap supposed to > release memory ? It o=ly does it on JVM restart now. Can you please stop spreading nonsense about MMapDirectory with no knowledge behind? http://www.linuxatemyram.com/ - Also applies to Windows. Uwe > Bill Bell > Sent from mobile > > > On Jul 22, 2012, at 6:21 AM, geetha anjali > wrote:= > > It happens in 3.6, for this reasons I thought of moving to solandra. > > If I do a commit, the all documents are persisted with out any issues. > > There is no issues in terms of any functionality, but only this > > happens i= increase in physical RAM, goes higher and higher and stop > > at maximum and i= never comes down. > > > > Thanks > > > > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog > wrote: > > > >> Interesting. Which version of Solr is this? What happens if you do a > >> commit? > >> > >> On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali > =>> wrote: > >>> Hi uwe, > >>> Great to know. We have files indexing 1/min. After 30 mins I see > >>> all=>>> my physical memory say its 100 percentage used(windows). On > >>> deep investigation found that mmap is not releasing os files handles. Do > you find this behaviour? > >>> > >>> Thanks > >>> > >>> On 20 Jul 2012 14:04, "Uwe Schindler" wrote: > >>> > >>> Hi Bill, > >>> > >>> MMapDirectory uses the file system cache of your operating system, > >>> which=>> has following consequences: In Linux, top & free should > >>> normally report only=>>> *few* free memory, because the O/S uses all > >>> memory not allocated by applications to cache disk I/O (and shows it > >>> as allocated, so having 0% > >> free > >>> memory is just normal on Linux and also Windows). If you have other > >>> applications or Lucene/Solr itself that allocate lot's of heap space > >>> or > >>> malloc() a lot, then you are reducing free physical memory, so > >>> reducing > >> fs > >>> cache. This depends also on your swappiness parameter (if swappiness > >>> is higher, inactive processes are swapped out easier, default is 60% > >>> on > >> linux - > >>> freeing more space for FS cache - the backside is of course that > >>> maybe in-memory structures of Lucene and other applications get pages > out). > >>> > >>> You will only see no paging at all if all memory allocated all > >> applications > >>> + all mmapped files fit into memory. But paging in/out the mmapped > >>> + Lucen= > >>> index is much cheaper than using SimpleFSDirectory or > >> NIOFSDirectory. If > >>> you use SimpleFS or NIO and your index is not in FS cache, it will > >>> also > >> read > >>> it from physical disk again, so where is the difference. Paging is > >> actually > >>> cheaper as no syscalls are involved. > >>> > >>> If you want as much as possible of your index in physical RAM, copy > >>> it t= /dev/null regularily and buy more RUM :-) > >>> > >>> > >>> - > >>> Uwe Schindler > >>> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > >>> eMail: uwe@thetaphi... > >>> > >>>> From: Bill Bell [mailto:billnb...@gmail.com] > >>>> Sent: Friday, July 20, 2012 5:17 AM > >>>> Subject: Re: ... > >&
RE: RE: How to setup SimpleFSDirectoryFactory
Hi, It seems that both of you simply don't understand what's happening in your operating system kernel. Please read the blog post again! > It happens in 3.6, for this reasons I thought of moving to solandra. > If I do a commit, the all documents are persisted with out any issues. > There is no issues in terms of any functionality, but only this happens is > increase in physical RAM, goes higher and higher and stop at maximum and it > never comes down. This is perfectly fine in Windows and Linux (and any other operating system). If an operating system would not use *all* available physical memory it would waste costly hardware resources. Why not use resources that are unused otherwise? As said before: O/S kernel uses *all* available physical RAM for caching file system accesses. The memory used for that is always reported as not free, because it is used (very simple, right?). But if some other application wants to use it, its free for malloc(), so it is not permanently occupied. That's always that case, using MMapDirectory or not (same for SimpleFSDirectory or NIOFSDirectory). Of course, when you freshly booted your kernel, it reports free memory, but definitely not on a server running 24/7 since weeks. For all people who don't want to understand that, here is the easy explanation page: http://www.linuxatemyram.com/ > > > all my physical memory say its 100 percentage used(windows). On deep > > > investigation found that mmap is not releasing os files handles. Do > > > you find this behaviour? One comment: The file handles are not freed as long as the index is open. Used file handles have nothing to do with memory mapping, it's completely unrelated to each other. Uwe > On Sun, Jul 22, 2012 at 3:38 AM, Lance Norskog wrote: > > > Interesting. Which version of Solr is this? What happens if you do a > > commit? > > > > On Sat, Jul 21, 2012 at 8:01 AM, geetha anjali > > > > wrote: > > > Hi uwe, > > > Great to know. We have files indexing 1/min. After 30 mins I see > > > all my physical memory say its 100 percentage used(windows). On deep > > > investigation found that mmap is not releasing os files handles. Do > > > you find this behaviour? > > > > > > Thanks > > > > > > On 20 Jul 2012 14:04, "Uwe Schindler" wrote: > > > > > > Hi Bill, > > > > > > MMapDirectory uses the file system cache of your operating system, > > > which > > has > > > following consequences: In Linux, top & free should normally report > > > only > > > *few* free memory, because the O/S uses all memory not allocated by > > > applications to cache disk I/O (and shows it as allocated, so having > > > 0% > > free > > > memory is just normal on Linux and also Windows). If you have other > > > applications or Lucene/Solr itself that allocate lot's of heap space > > > or > > > malloc() a lot, then you are reducing free physical memory, so > > > reducing > > fs > > > cache. This depends also on your swappiness parameter (if swappiness > > > is higher, inactive processes are swapped out easier, default is 60% > > > on > > linux - > > > freeing more space for FS cache - the backside is of course that > > > maybe in-memory structures of Lucene and other applications get pages > out). > > > > > > You will only see no paging at all if all memory allocated all > > applications > > > + all mmapped files fit into memory. But paging in/out the mmapped > > > + Lucene > > > index is muuuuuch cheaper than using SimpleFSDirectory or > > NIOFSDirectory. If > > > you use SimpleFS or NIO and your index is not in FS cache, it will > > > also > > read > > > it from physical disk again, so where is the difference. Paging is > > actually > > > cheaper as no syscalls are involved. > > > > > > If you want as much as possible of your index in physical RAM, copy > > > it to /dev/null regularily and buy more RUM :-) > > > > > > > > > - > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > > eMail: uwe@thetaphi... > > > > > >> From: Bill Bell [mailto:billnb...@gmail.com] > > >> Sent: Friday, July 20, 2012 5:17 AM > > >> Subject: Re: ... > > >> s=op using it? The least used memory will be removed from the OS > > >> automaticall=? Isee some paging. Wouldn't paging slow down the > querying? > > > > > >> > > >> My index
[ANNOUNCE] Apache Solr 3.6.1 released
22 July 2012, Apache SolrT 3.6.1 available The Lucene PMC is pleased to announce the release of Apache Solr 3.6.1. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This release is a bug fix release for version 3.6.0. It contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://lucene.apache.org/solr/mirrors-solr-3x-redir.html (see note below). See the CHANGES.txt file included with the release for a full list of details. Solr 3.6.1 Release Highlights: * The concurrency of MMapDirectory was improved, which caused a performance regression in comparison to Solr 3.5.0. This affected users with 64bit platforms (Linux, Solaris, Windows) or those explicitely using MMapDirectoryFactory. * ReplicationHandler "maxNumberOfBackups" was fixed to work if backups are triggered on commit. * Charset problems were fixed with HttpSolrServer, caused by an upgrade to a new Commons HttpClient version in 3.6.0. * Grouping was fixed to return correct count when not all shards are queried in the second pass. Solr no longer throws Exception when using result grouping with main=true and using wt=javabin. * Config file replication was made less error prone. * Data Import Handler threading fixes. * Various minor bugs were fixed. Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Happy searching, Uwe Schindler (release manager) & all Lucene/Solr developers - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: How to setup SimpleFSDirectoryFactory
Hi Bill, MMapDirectory uses the file system cache of your operating system, which has following consequences: In Linux, top & free should normally report only *few* free memory, because the O/S uses all memory not allocated by applications to cache disk I/O (and shows it as allocated, so having 0% free memory is just normal on Linux and also Windows). If you have other applications or Lucene/Solr itself that allocate lot's of heap space or malloc() a lot, then you are reducing free physical memory, so reducing fs cache. This depends also on your swappiness parameter (if swappiness is higher, inactive processes are swapped out easier, default is 60% on linux - freeing more space for FS cache - the backside is of course that maybe in-memory structures of Lucene and other applications get pages out). You will only see no paging at all if all memory allocated all applications + all mmapped files fit into memory. But paging in/out the mmapped Lucene index is much cheaper than using SimpleFSDirectory or NIOFSDirectory. If you use SimpleFS or NIO and your index is not in FS cache, it will also read it from physical disk again, so where is the difference. Paging is actually cheaper as no syscalls are involved. If you want as much as possible of your index in physical RAM, copy it to /dev/null regularily and buy more RUM :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Bill Bell [mailto:billnb...@gmail.com] > Sent: Friday, July 20, 2012 5:17 AM > Subject: Re: How to setup SimpleFSDirectoryFactory > > Thanks. Are you saying that if we run low on memory, the MMapDirectory will > s=op using it? The least used memory will be removed from the OS > automaticall=? Isee some paging. Wouldn't paging slow down the querying? > > My index is 10gb and every 8 hours we get most of it in shared memory. The > m=mory is 99 percent used, and that does not leave any room for other apps. = > Other implications? > > Sent from my mobile device > 720-256-8076 > > On Jul 19, 2012, at 9:49 AM, "Uwe Schindler" wrote: > > > Read this, then you will see that MMapDirectory will use 0% of your Java > H=ap space or free system RAM: > > > > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.htm > > l > > > > Uwe > > > > - > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -Original Message- > >> From: William Bell [mailto:billnb...@gmail.com] > >> Sent: Tuesday, July 17, 2012 6:05 AM > >> Subject: How to setup SimpleFSDirectoryFactory > >> > >> We all know that MMapDirectory is fastest. However we cannot always > >> use i= since you might run out of memory on large indexes right? > >> > >> Here is how I got iSimpleFSDirectoryFactory to work. Just set - > >> Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory. > >> > >> Your solrconfig.xml: > >> > >> >> class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> > >> > >> You can check it with http://localhost:8983/solr/admin/stats.jsp > >> > >> Notice that the default for Windows 64bit is MMapDirectory. Else > >> NIOFSDirectory except for WIndows It would be nicer if we just > >> set it=all up with a helper in solrconfig.xml... > >> > >> if (Constants.WINDOWS) { > >> if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT) > >>return new MMapDirectory(path, lockFactory); > >> else > >>return new SimpleFSDirectory(path, lockFactory); > >> } else { > >>return new NIOFSDirectory(path, lockFactory); > >> } > >> } > >> > >> > >> > >> -- > >> Bill Bell > >> billnb...@gmail.com > >> cell 720-256-8076 > > > >
RE: How to setup SimpleFSDirectoryFactory
Read this, then you will see that MMapDirectory will use 0% of your Java Heap space or free system RAM: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: William Bell [mailto:billnb...@gmail.com] > Sent: Tuesday, July 17, 2012 6:05 AM > Subject: How to setup SimpleFSDirectoryFactory > > We all know that MMapDirectory is fastest. However we cannot always use it > since you might run out of memory on large indexes right? > > Here is how I got iSimpleFSDirectoryFactory to work. Just set - > Dsolr.directoryFactory=solr.SimpleFSDirectoryFactory. > > Your solrconfig.xml: > > class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/> > > You can check it with http://localhost:8983/solr/admin/stats.jsp > > Notice that the default for Windows 64bit is MMapDirectory. Else > NIOFSDirectory except for WIndows It would be nicer if we just set it all > up > with a helper in solrconfig.xml... > > if (Constants.WINDOWS) { > if (MMapDirectory.UNMAP_SUPPORTED && Constants.JRE_IS_64BIT) > return new MMapDirectory(path, lockFactory); > else > return new SimpleFSDirectory(path, lockFactory); > } else { > return new NIOFSDirectory(path, lockFactory); > } > } > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076
Java 7u1 fixes index corruption and crash bugs in Apache Lucene Core and Apache Solr
Hi users of Apache Lucene Core and Apache Solr, Oracle released Java 7u1 [1] on October 19. According to the release notes and tests done by the Lucene committers, all bugs reported on July 28 are fixed in this release, so code using Porter stemmer no longer crashes with SIGSEGV. We were not able to experience any index corruption anymore, so it is safe to use Java 7u1 with Lucene Core and Solr. On the same day, Oracle released Java 6u29 [2] fixing the same problems occurring with Java 6, if the JVM switches -XX:+AggressiveOpts or -XX:+OptimizeStringConcat were used. Of course, you should not use experimental JVM options like -XX:+AggressiveOpts in production environments! We recommend everybody to upgrade to this latest version 6u29. In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package! On behalf of the Apache Lucene/Solr committers, Uwe Schindler [1] http://www.oracle.com/technetwork/java/javase/7u1-relnotes-507962.html [2] http://www.oracle.com/technetwork/java/javase/6u29-relnotes-507960.html - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
[WARNING] Index corruption and crashes in Apache Lucene Core / Apache Solr with Java 7
Hello Apache Lucene & Apache Solr users, Hello users of other Java-based Apache projects, Oracle released Java 7 today. Unfortunately it contains hotspot compiler optimizations, which miscompile some loops. This can affect code of several Apache projects. Sometimes JVMs only crash, but in several cases, results calculated can be incorrect, leading to bugs in applications (see Hotspot bugs 7070134 [1], 7044738 [2], 7068051 [3]). Apache Lucene Core and Apache Solr are two Apache projects, which are affected by these bugs, namely all versions released until today. Solr users with the default configuration will have Java crashing with SIGSEGV as soon as they start to index documents, as one affected part is the well-known Porter stemmer (see LUCENE-3335 [4]). Other loops in Lucene may be miscompiled, too, leading to index corruption (especially on Lucene trunk with pulsing codec; other loops may be affected, too - LUCENE-3346 [5]). These problems were detected only 5 days before the official Java 7 release, so Oracle had no time to fix those bugs, affecting also many more applications. In response to our questions, they proposed to include the fixes into service release u2 (eventually into service release u1, see [6]). This means you cannot use Apache Lucene/Solr with Java 7 releases before Update 2! If you do, please don't open bug reports, it is not the committers' fault! At least disable loop optimizations using the -XX:-UseLoopPredicate JVM option to not risk index corruptions. Please note: Also Java 6 users are affected, if they use one of those JVM options, which are not enabled by default: -XX:+OptimizeStringConcat or -XX:+AggressiveOpts It is strongly recommended not to use any hotspot optimization switches in any Java version without extensive testing! In case you upgrade to Java 7, remember that you may have to reindex, as the unicode version shipped with Java 7 changed and tokenization behaves differently (e.g. lowercasing). For more information, read JRE_VERSION_MIGRATION.txt in your distribution package! On behalf of the Lucene project, Uwe [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7070134 [2] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7044738 [3] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7068051 [4] https://issues.apache.org/jira/browse/LUCENE-3335 [5] https://issues.apache.org/jira/browse/LUCENE-3346 [6] http://s.apache.org/StQ - Uwe Schindler uschind...@apache.org Apache Lucene PMC Member / Committer Bremen, Germany http://lucene.apache.org/
RE: Solr 3.1 / Java 1.5: Exception regarding analyzer implementation
Hi, > On 09.05.11 11:04, Martin Jansen wrote: > > I just attempted to set up an instance of Solr 3.1 in Tomcat 5.5 > > running in Java 1.5. It fails with the following exception on start-up: > > > >> java.lang.AssertionError: Analyzer implementation classes or at least > >> their tokenStream() and reusableTokenStream() implementations must > be > >> final at > >> org.apache.lucene.analysis.Analyzer.assertFinal(Analyzer.java:57) > > In the meantime I solved the issue by installing Java 1.6. Works without a > problem now, but I'm wondering if Solr 3.1 is intentionally incompatible to > Java 1.5 or if if happened by mistake. Solr 3.1 is compatible with Java 1.5 and runs fine with that. The exception you are seeing should not happen for Analyzers that are shipped with Solr/Lucene, they can only happen if you wrote your own Analyzer/TokenStreams that are not declared final as requested. In that case the error will also happen with Java 6. BUT: This is only an assertion to make development and debugging easier. The assertions should not run in production mode, as they may affect performance (seriously)! You should check you java command line for -ea parameters and remove them on production. The reason why this assert hits you in one of your tomcat installations could also be related to some instrumentation tools you have enabled in this tomcat. Lot's of instrumentation tools may dynamically change class bytecode and e.g. make them unfinal. In that case the assertion of course fails (with assertions enabled). Before saying Solr 3.1 is not compatible with Java 1.5: - Disable assertions in production (by removing -ea command line parameters, see http://download.oracle.com/javase/1.4.2/docs/guide/lang/assert.html) - Check your configuration if you have some instrumentation enabled. Both of the above points may not affect you on the other server that runs fine with Java 6. Uwe