[jira] Commented: (SOLR-84) Logo Contests
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12648645#action_12648645 ] Doug Cutting commented on SOLR-84: -- I'm liking https://issues.apache.org/jira/secure/attachment/12393951/sslogo-solr-classic.png Me too. Logo Contests - Key: SOLR-84 URL: https://issues.apache.org/jira/browse/SOLR-84 Project: Solr Issue Type: Improvement Reporter: Bertrand Delacretaz Priority: Minor Attachments: apache-solr-004.png, apache_solr_burning.png, apache_solr_contour.png, apache_solr_sun.png, logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, logo-solr-source-files-take2.zip, logo_remake.jpg, logo_remake.svg, solr-84-source-files.zip, solr-f.jpg, solr-greyscale.png, solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, solr-logo.jpg, solr-logo.png, solr-nick.gif, solr.jpg, solr.png, solr.s1.jpg, solr.s2.jpg, solr.s3.jpg, solr.svg, solr_attempt.jpg, solr_attempt2.jpg, solrlogo.jpg, sslogo-solr-70s.png, sslogo-solr-classic.png, sslogo-solr-dance.png, sslogo-solr-fiesta.png, sslogo-solr-finder2.0.png This issue was original a scratch pad for various ideas for new Logos. It is now being used as a repository for submissions for the Solr Logo Contest... http://wiki.apache.org/solr/LogoContest Note that many of the images currently attached are not eligible for the contest since they do not meet the official guidelines for new Apache project logos (in particular that the full project name Apache Solr must be included in the Logo). Only eligible attachments will be included in the official voting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-84) Logo Contests
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647660#action_12647660 ] Doug Cutting commented on SOLR-84: -- I like https://issues.apache.org/jira/secure/attachment/12349896/logo-solr-e.jpg and https://issues.apache.org/jira/secure/attachment/12358494/sslogo-solr.jpg, because they're simple and scale down well. It should be possible to scale the logo, or a salient part of it, as small as a favicon (16x16) and still have it easily recognized. Most of the designs above require a lot of pixels to be recognizable. A good logo should be iconic more than textual--an abstract symbol. Often you can sample an element of a logo to form a favicon (like we do with Lucene's 'L'). So, when voting, think about whether there's an easily identifiable sample (e.g., is the typeface of the 'S' distinctive?). I note that Steve Stedman did not provide his logos under the Apache license. Was that intentional? I like his quite a lot... Logo Contests - Key: SOLR-84 URL: https://issues.apache.org/jira/browse/SOLR-84 Project: Solr Issue Type: Improvement Reporter: Bertrand Delacretaz Priority: Minor Attachments: apache-solr-004.png, apache_solr_burning.png, apache_solr_contour.png, apache_solr_sun.png, logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, logo-solr-source-files-take2.zip, logo_remake.jpg, logo_remake.svg, solr-84-source-files.zip, solr-f.jpg, solr-greyscale.png, solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, solr-logo.jpg, solr-nick.gif, solr.jpg, solr.png, solr.s1.jpg, solr.s2.jpg, solr.s3.jpg, solr.svg, solr_attempt.jpg, solr_attempt2.jpg, sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg, sslogo-solr2.jpg, sslogo-solr3.jpg This issue was original a scratch pad for various ideas for new Logos. It is now being used as a repository for submissions for the Solr Logo Contest... http://wiki.apache.org/solr/LogoContest Note that many of the images currently attached are not eligible for the contest since they do not meet the official guidelines for new Apache project logos (in particular that the full project name Apache Solr must be included in the Logo). Only eligible attachments will be included in the official voting. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] release rc4 as Solr 1.2
+1 This looks good to me. Doug Yonik Seeley wrote: OK, this one is it! Please vote to release the artifacts at http://people.apache.org/~yonik/staging_area/solr/1.2rc4/ as Apache Solr 1.2 +1 -Yonik
Re: [Solr Wiki] Update of TaskList by YonikSeeley
Apache Wiki wrote: * have everyone update their subversion working directories (remember to update SVN paths in IDEs too, etc) Note that 'svn switch' makes this easy. Doug
Re: [Solr Wiki] Update of TaskList by YonikSeeley
Apache Wiki wrote: * move website * checkout in new location (from the new svn location too) Note that you can update the .htaccess file in /www/incubator.apache.org/solr to redirect the old site to the new site. http://svn.apache.org/repos/asf/incubator/public/trunk/site-publish/.htaccess Doug
Solr on Lucene home page?
Should Solr have a tab on Lucene's home page? Other incubating Lucene-related projects do. I think it would be appropriate. Doug
Re: BufferingTokenStream and RemoveDups
This would be useful for implementing an N-gram filter. I'd support adding something like this to the Lucene core. Doug Yonik Seeley wrote: Just brainstorming... Here's a completely untested prototype of what a BufferingTokenStream might look like, and a possible implementation of removing duplicate tokens on top of it. -Yonik class RemoveDuplicatesTokenFilter extends BufferedTokenStream { public RemoveDuplicatesTokenFilter(TokenStream input) {super(input);} public Token process(Token t) throws IOException { Token tok = read(); while (tok!=null tok.getPositionIncrement()==0) { boolean dup=false; for (Token outTok : output()) { if (outTok.termText().equals(tok.termText())) { dup=true; break; } } if (!dup) write(tok); } if (tok != null) pushBack(tok); return t; } } /** * Handles input and output buffering of TokenStream * * * // Example of a class implementing the rule A B = Q * class MyTokenStream extends BufferedTokenStream { * public MyTokenStream(TokenStream input) {super(input);} * public Token process(Token t) throws IOException { * if (A.equals(t.termText())) { * Token t2 = read(); * if (t2!=null B.equals(t2.termText())) t.setTermText(Q); * if (t2!=null) pushBack(t2); * } * return t; * } *} * * pre * // Example of a class implementing A B = A A B *class MyTokenStream extends BufferedTokenStream { * public MyTokenStream(TokenStream input) {super(input);} * public Token process(Token t) throws IOException { *if (A.equals(t.termText()) B.equals(peek(1).termText()) write(t); *return t; * } *} * /pre * * * @author yonik * @version $Id$ */ abstract class BufferedTokenStream extends TokenStream { // in the futute, might be faster if we implemented as an array based CircularQueue private final LinkedListToken inQueue = new LinkedListToken(); private final LinkedListToken outQueue = new LinkedListToken(); private final TokenStream input; public BufferedTokenStream(TokenStream input) { this.input = input; } /** Process a token. Subclasses may read more tokens from the input stream, * write more tokens to the output stream, or simply return the next token * to be output. Subclasses may return null if the token is to be dropped. * If a subclass writes tokens to the output stream and returns a non-null Token, * the returned Token is considered to be at the head of the token output stream. */ public abstract Token process(Token t) throws IOException; public final Token next() throws IOException { while (true) { if (!outQueue.isEmpty()) return outQueue.removeFirst(); Token t = inQueue.isEmpty() ? input.next() : inQueue.removeFirst(); if (t==null) return null; Token out = process(t); if (out!=null) return out; // loop back to top in case process() put something on the output queue } } /** read a token from the input stream */ public Token read() throws IOException { if (inQueue.size()==0) { Token t = input.next(); return t; } return inQueue.getFirst(); } /** push a token back into the input stream */ public void pushBack(Token t) { inQueue.addFirst(t); } /** peek n tokens ahead in the stream (1 based... 0 is invalid) */ public Token peek(int n) throws IOException { int fillCount = n-inQueue.size(); for (int i=0; ifillCount; i++) { Token t = input.next(); if (t==null) return null; inQueue.add(t); } return inQueue.get(n-1); } /** write a token to the output stream */ public void write(Token t) { outQueue.add(t); } IterableToken output() { return outQueue; } }
Re: GData, updateable IndexSearcher
jason rutherglen wrote: Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene? In general, there is no plan for Lucene. If someone implements a generally useful, efficient, feature in a back-compatible, easy to use, manner, and submits it as a patch, then it becomes a part of Lucene. That's the way Lucene changes. Since we don't pay anyone, we can't make plans and assign tasks. So if you're particularly interested in this feature, you might search the archives to find past efforts, or simply try to implement it yourself. I think a good approach would be to create a new IndexSearcher instance based on an existing one, that shares IndexReaders. Similarly, one should be able to create a new IndexReader based on an existing one. This would be a MultiReader that shares many of the same SegmentReaders. Things get a little tricky after this. Lucene caches filters based on the IndexReader. So filters would need to be re-created. Ideally these could be incrementally re-created, but that might be difficult. What might be simpler would be to use a MultiSearcher constructed with an IndexSearcher per SegmentReader, avoiding the use of MultiReader. Then the caches would still work. This would require making a few things public that are not at present. Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an 'static IndexReader.reopen(IndexReader)' method. The latter would return a new MultiReader that shared SegmentReaders with the old version. Then one could use getSubReaders() on the new multi reader to extract the current set to use when constructing a MultiSearcher. Another tricky bit is figuring out when to close readers. Does this make sense? This discussion should probably move to the lucene-dev list. Are there any negatives to updateable IndexSearchers? Not if implemented well! Doug
Re: GData
Ian Holsman wrote: I noticed you guys have created a 'gdata-lucene' server in the SoC project. are you planning on doing this via SoLR? or is it something brand new? We decided that doing this via Solr would probably make it more complicated. A simple, standalone GData server built just using just Lucene is what we had in mind for the SoC project. This could then become a Lucene contrib module. Doug
Re: GData
jason rutherglen wrote: Is a faster method of loading or updating the IndexSearcher something that makes sense for Lucene? Yes. Folks have developed incrementally updateable IndexSearchers before, but none is yet part of Lucene. Or just assume the Google architecture is a lot more complex. That's probably a safe assumption. Their architecture is designed to support real-time things like calendars, email, etc. Search engines, Lucene's normal domain, are not usually real-time, but have indexing delays. Doug
Re: GData
jason rutherglen wrote: Ah ok, think I found it: org.apache.nutch.indexer.FsDirectory no? Couldn't this be used in Solr and distribute all the data rather than master/slave it? It's possible to search a Lucene index that lives in Hadoop's DFS, but not recommended. It's very slow. It's much faster to copy the index to a local drive. The rsync approach, of only transmitting index diffs, is a very efficient way to distribute an index. In particular, it supports scaling the number of *readers* well. For read/write stuff (e.g. a calendar) such scaling might not be paramount. Rather, you might be happy to route all requests for a particular calendar to a particular server. The index/database could still be somehow replicated/synced, in case that server dies, but a single server can probably handle all requests for a particular index/database. And keeping things coherent is much simpler in this case. Doug
GData
How hard would it be to build a GData server using Solr? An open-source, Lucene-based GData server would be a good thing to have. Does this fit in Solr, or should it be a separate project? http://code.google.com/apis/gdata/overview.html Another summer of code project? Doug
Re: GData
Yoav Shapira wrote: Getting back to Doug's original point about this as a possible SoC project: it seems a little too big from the technical discussion so far. It might actually be a simpler project if it were standalone: not built into Solr, but rather a Lucene contrib project. One only has to write a few servlets that translate each requests into Lucene events: add, delete, delete+add, or query. It wouldn't have lots of Solr's fancy features (faceted searching, replication, etc.) but could still be a very useful thing. Do folks think that would be a tractable SoC project? Doug
Re: summer of code: solr for apache mail archives
Ian Holsman wrote: If you could get it so that it interfaces with mod-mbox (what they currently use) that would be a better solution for the ASF infrastructure I think. I assume that search results would be displayed with mod-mbox, i.e., links in the hit list would be links to mail-archive.a.o. Is that what you mean? Also, in my original message I said: We can setup a notification mechanism for new messages with Apache infrastructure. I now note that mod_mbox provides Atom feeds for each list. So we can just poll those to index new messages. We could generate the current list of feeds by scraping http://mail-archives.apache.org/mod_mbox/. Doug
solar - solr
There are still a lot of references to Solar, both in the sources and in the documentation. I just fixed a few of the obvious ones in the documentation. Doug
Re: svn commit: r378133 - /incubator/solr/trunk/build.xml
[EMAIL PROTECTED] wrote: add example and dist-example targets to build.xml So you think the example should be a separate download, not part of the standard download? I find multiple downloads confusing and think it would be better to focus on a single download that includes a demo. Do you disagree? Doug
Re: forrest version?
Yoav Shapira wrote: OK, consistency is a good value. I still think it's more of a pain than it's worth, unless Nutch/Lucene are using Forrest features that can't be done far more simply. We could achieve the same look and feel via CSS skinning I'd imagine... But since I don't have much bandwidth to spend on this, I don't want to lobby too hard: if you're comfortable with Forrest for consistency's sake, that's cool. I thought that Forrest would be simpler, since we could just clone stuff in Nutch. I'm not in love with Forrest. If you think another way would be simpler yet, have at it. I don't think the look-and-feel do not need to match other projects too closely. For the record, I used forrest-0.6 on Nutch and Lucene TLP. Doug
Re: Things to do
Ian Holsman wrote: Doug mentioned that we could use the lucene 'zone' to get a working demo of Solr which I think we should start as soon as we can get a build going. We certainly could do this, but I'm not sure that we should. A real Solr demo will be read/write, and I'm not sure we want to support a read/write demo on lucene.zones.apache.org, with random folks on the internet able to write to it. What would be really cool to build on lucene.zones.apache.org using solr is a mail-archive search app. mail-archives.apache.org supports rss feeds for all mailing lists, so we could simply write a daemon that periodically polls the feed for each list and stuffs all new messages into a solr index. Doug
Re: svn commit: r373402 - in /incubator/solr/trunk/src/test/org: ./ apache/ apache/solr/ apache/solr/analysis/ apache/solr/analysis/TestSynonymFilter.java
Yonik Seeley wrote: Many of the things in the lucene package (FunctionQuery and SynonymFilter) could be moved to org.apache.solr, and renamed to org.apache.lucene if/when they officially become part of lucene. But the other reason for the org.apache.lucene package is for accessing package-protected lucene stuff. Currently there is just PublicFieldSortedHitQueue, but there was more when we used Lucene 1.4. Everything that's not required to be in a lucene package for access reasons should be in org.apache.solr. And we should try to fix Lucene so that nothing has to be in its packages. Doug