Re: Solr does not receive any documents by nutch

2016-02-26 Thread Merlin Morgenstern
during the nutch run there are no activites inside the logfile as it seems.
However the logfile from the admin interface shows the following:

2/26/2016, 5:20:04 PM WARN null SolrConfig Couldn't add files from
/usr/local/Cellar/solr/5.4.1/contrib/extraction/lib filtered by .*\.jar to
classpath: /usr/local/Cellar/solr/5.4.1/contrib/extraction/lib
2/26/2016, 5:20:04 PM WARN null SolrConfig Couldn't add files from
/usr/local/Cellar/solr/5.4.1/dist filtered by solr-cell-\d.*\.jar to
classpath: /usr/local/Cellar/solr/5.4.1/dist

2016-02-26 17:14 GMT+01:00 Shawn Heisey :

> On 2/26/2016 8:34 AM, Merlin Morgenstern wrote:
> > Unfortunatelly no documents get added to solr and no error log entries
> show
> > up. It seems as it would be working,  but the documents are not there.
>
> Is there anything happening in the Solr logfile at all during the nutch
> run?  I'm talking about any activity at all, not just errors.
>
> For the 4.x install I have no way to know where your logfile is, but if
> you used the service installer script for the newer version, the log
> will usually be in /var/solr/logs.
>
> Thanks,
> Shawn
>
>


Solr does not receive any documents by nutch

2016-02-26 Thread Merlin Morgenstern
I have nutch 1.11 installed together with solr 4.10.4 AND solr 5.4.1 on OS
X 10.11.

Nutch and Solr seem to work as nutch starts to index and solr shows the
admin interface together with the configured core.

Unfortunatelly no documents get added to solr and no error log entries show
up. It seems as it would be working,  but the documents are not there.

The command I am using is:

./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/collection1
urls/ crawl/ 1

AND

./bin/crawl -i -D solr.server.url=http://localhost:8984/solr/crawler urls/
crawl/ 1

Nutch starts to crawl one configured domain and shows "fetching ...", I
then stop the process (ctrl c) after about 3 minutes as it would otherwise
crawl the entire domain.

Solr schema is configured and all other configurations found inside
tutorials has been done as well.

How could I approach this problem and get started? Thank you in advance for
any help!


How to check Zookeeper ensemble status?

2015-09-18 Thread Merlin Morgenstern
I am running a 3 node zookeeper ensemble on 3 machines dedicated to
SolrCloud 5.2.x

Inside the Solr Admin-UI I can check "live nodes", but how can I check if
all three zookeeper nodes are up?

I am asking since node2 has 25% CPU usage by zookeeper while beeing idle
and I wonder what the cause is. Maybe zookeeper can not connect to the
other nodes or whatever it is, which braught me to the question how to
check if all 3 nodes are operational.

Thank you for any help on this!


Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern
Thank you for the info.

I have already downgraded to 5.2.x as this is a production setup.
Unfortunatelly I have the same trouble there ... Any suggestions how to fix
this? What is the recommended procedure in securing the admin gui on prod
setups?

2015-09-11 14:26 GMT+02:00 Noble Paul :

> There were some bugs with the 5.3.0 release and 5.3.1 is in the
> process of getting released.
>
> try out the option #2 with the RC here
>
>
> https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-5.3.1-RC1-rev1702389/solr/
>
>
>
> On Fri, Sep 11, 2015 at 5:16 PM, Merlin Morgenstern
>  wrote:
> > OK, I downgraded to solr 5.2.x
> >
> > Unfortunatelly still no luck. I followed 2 aproaches:
> >
> > 1. Secure it the old fashioned way like described here:
> >
> http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password
> >
> > 2. Using the Basic Authentication Plugin like described here:
> > http://lucidworks.com/blog/securing-solr-basic-auth-permission-rules/
> >
> > Both aproaches created unsolved problems.
> >
> > While following option 1, I was able to secure the Admin UI with basic
> > authentication, but no longer able to access my application despite the
> > fact that it was working on solr 3.x with the same type of authentication
> > procedure and credentials.
> >
> > While following option 2, I was stuck right after uploading the
> > security.json file to the zookeeper ensemble. The described behaviour to
> curl
> > http://localhost:8983/solr/admin/authentication responded with a 404 not
> > found and then solr could not connect to zookeeper. I had to remove that
> > file from zookeeper and restart all solr nodes.
> >
> > Please could someone lead me the way on how to secure the Admin UI and
> > password protect solr cloud? I have a perfectly running system with solr
> > 3.x and one core and now taking it to solr cloud 5.2.x into production
> > seems to be stoped by simple authorization problems.
> >
> > Thank you in advane for any help.
> >
> >
> >
> > 2015-09-10 20:42 GMT+02:00 Noble Paul :
> >
> >> Check this
> https://cwiki.apache.org/confluence/display/solr/Securing+Solr
> >>
> >> There a couple of bugs in 5.3.o and a bug fix release is coming up
> >> over the next few days.
> >>
> >> We don't provide any specific means to restrict access to admin UI
> >> itself. However we let users specify fine grained ACLs on various
> >> operations such collection-admin-edit, read etc
> >>
> >> On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern
> >>  wrote:
> >> > I just installed solr cloud 5.3.x and found that the way to secure the
> >> amin
> >> > ui has changed. Aparently there is a new plugin which does role based
> >> > authentification and all info on how to secure the admin UI found on
> the
> >> > net is outdated.
> >> >
> >> > I do not need role based authentification but just simply want to put
> >> basic
> >> > authentification to the Admin UI.
> >> >
> >> > How do I configure solr cloud 5.3.x in order to restrict access to the
> >> > Admin UI via Basic Authentification?
> >> >
> >> > Thank you for any help
> >>
> >>
> >>
> >> --
> >> -
> >> Noble Paul
> >>
>
>
>
> --
> -
> Noble Paul
>


Re: How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-11 Thread Merlin Morgenstern
OK, I downgraded to solr 5.2.x

Unfortunatelly still no luck. I followed 2 aproaches:

1. Secure it the old fashioned way like described here:
http://stackoverflow.com/questions/28043957/how-to-set-apache-solr-admin-password

2. Using the Basic Authentication Plugin like described here:
http://lucidworks.com/blog/securing-solr-basic-auth-permission-rules/

Both aproaches created unsolved problems.

While following option 1, I was able to secure the Admin UI with basic
authentication, but no longer able to access my application despite the
fact that it was working on solr 3.x with the same type of authentication
procedure and credentials.

While following option 2, I was stuck right after uploading the
security.json file to the zookeeper ensemble. The described behaviour to curl
http://localhost:8983/solr/admin/authentication responded with a 404 not
found and then solr could not connect to zookeeper. I had to remove that
file from zookeeper and restart all solr nodes.

Please could someone lead me the way on how to secure the Admin UI and
password protect solr cloud? I have a perfectly running system with solr
3.x and one core and now taking it to solr cloud 5.2.x into production
seems to be stoped by simple authorization problems.

Thank you in advane for any help.



2015-09-10 20:42 GMT+02:00 Noble Paul :

> Check this https://cwiki.apache.org/confluence/display/solr/Securing+Solr
>
> There a couple of bugs in 5.3.o and a bug fix release is coming up
> over the next few days.
>
> We don't provide any specific means to restrict access to admin UI
> itself. However we let users specify fine grained ACLs on various
> operations such collection-admin-edit, read etc
>
> On Wed, Sep 9, 2015 at 2:35 PM, Merlin Morgenstern
>  wrote:
> > I just installed solr cloud 5.3.x and found that the way to secure the
> amin
> > ui has changed. Aparently there is a new plugin which does role based
> > authentification and all info on how to secure the admin UI found on the
> > net is outdated.
> >
> > I do not need role based authentification but just simply want to put
> basic
> > authentification to the Admin UI.
> >
> > How do I configure solr cloud 5.3.x in order to restrict access to the
> > Admin UI via Basic Authentification?
> >
> > Thank you for any help
>
>
>
> --
> -
> Noble Paul
>


Solr authentication - Error 401 Unauthorized

2015-09-11 Thread Merlin Morgenstern
I have secured solr cloud via basic authentication.

Now I am having difficulties creating cores and getting status information.
Solr keeps telling me that the request is unothorized. However, I have
access to the admin UI after login.

How do I configure solr to use the basic authentication credentials?

This is the error message:

/opt/solr-5.3.0/bin/solr status

Found 1 Solr nodes:

Solr process 31114 running on port 8983

ERROR: Failed to get system information from http://localhost:8983/solr due
to: org.apache.http.client.ClientProtocolException: Expected JSON response
from server but received: 





Error 401 Unauthorized



HTTP ERROR 401

Problem accessing /solr/admin/info/system. Reason:

UnauthorizedPowered by
Jetty://







How to secure Admin UI with Basic Auth in Solr 5.3.x

2015-09-09 Thread Merlin Morgenstern
I just installed solr cloud 5.3.x and found that the way to secure the amin
ui has changed. Aparently there is a new plugin which does role based
authentification and all info on how to secure the admin UI found on the
net is outdated.

I do not need role based authentification but just simply want to put basic
authentification to the Admin UI.

How do I configure solr cloud 5.3.x in order to restrict access to the
Admin UI via Basic Authentification?

Thank you for any help


What is the correct path for mysql jdbc connector on Solr?

2015-08-28 Thread Merlin Morgenstern
I have solrcloud installation running on 3 machines where I would like to
import data from mysql. Unfortunatelly the import failes due to the missing
jdbc connector.

My guess is, that I am having trouble with the right directory.

solrconfig.xml:

  

file location:

node1:/opt/solr-5.2.1/dist/mysql-connector-java-5.1.36-bin.jar

error message:


Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
org.apache.solr.handler.dataimport.DataImportHandlerException: Could not
load driver: com.mysql.jdbc.Driver Processing Document # 1

How many directories do I have to go up inside the config "../... " ?

The config is uploaded OK within zookeeper and solr has been restarted.

Thank you for any help on this!


How to add second Zookeeper to same machine?

2015-08-20 Thread Merlin Morgenstern
I am running 2 dedicated servers on which I plan to install Solrcloud with
2 solr nodes and 3 ZK.

>From Stackoverflow I learned that the best method for autostarting
zookeeper on ubuntu 14.04 is to install it via "apt-get install
zookeeperd". I have that running now.

How could I add a second zookeeper to one machine? The config only allows
one. Or if this is not possible, what would be the recommended way to get 3
ZK on 2 dedicated running?

I have followed a tutorial where I have that setup available va bash
script, but it seems that the ubuntu zookeeper setup is robust as it offers
zombie processes and a startup script as well.

Thank you for any help on this.


Re: Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
Thank you for the quick answer. I learned now how to use the Collections
API.

Is there a "better" way to issue the commands then to enter them into the
Browser as URL and getting back JSON?



2015-08-19 22:23 GMT+02:00 Erick Erickson :

> No, nothing. The graphical view shows collections and the associated
> replicas.
> This new node has no replicas that are part of any collection, so it won't
> show in the graphical view.
>
> If you create a new collection that happens to put a replica on the new
> node,
> it'll then show up as part of that collection in the graphical view.
>
> If you do an ADDREPLICA to the existing collection and specify the new
> machine with the "node" parameter, see:
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api_addreplica
>
> then it should show up.
>
> On Wed, Aug 19, 2015 at 12:42 PM, Merlin Morgenstern
>  wrote:
> > I have a Solrcloud cluster running with 2 nodes, configured with 1 shard
> > and 2 replica. Now I have added a node on a new server, registered with
> the
> > same three zookeepers. The node shows up inside the tree of the Solrcloud
> > admin GUI under "live nodes".
> >
> > Unfortunatelly the new node is not inside the graphical view and it
> shows 0
> > cores available while the other admin interface shows the available
> core. I
> > have also shutdown the second replica server which is now grayed out. But
> > still third node not available.
> >
> > Is there something I have to do in order to add a node, despite
> registering
> > it? This is the startup command I am using:
> > bin/solr start -cloud -s server/solr2 -p 8983 -z
> zk1:2181,zk1:2182,zk1:2183
> > -noprompt
>


Solrcloud node is not comming up

2015-08-19 Thread Merlin Morgenstern
I have a Solrcloud cluster running with 2 nodes, configured with 1 shard
and 2 replica. Now I have added a node on a new server, registered with the
same three zookeepers. The node shows up inside the tree of the Solrcloud
admin GUI under "live nodes".

Unfortunatelly the new node is not inside the graphical view and it shows 0
cores available while the other admin interface shows the available core. I
have also shutdown the second replica server which is now grayed out. But
still third node not available.

Is there something I have to do in order to add a node, despite registering
it? This is the startup command I am using:
bin/solr start -cloud -s server/solr2 -p 8983 -z zk1:2181,zk1:2182,zk1:2183
-noprompt


Difficulties in getting Solrcloud running

2015-08-19 Thread Merlin Morgenstern
HI everybody,

I am trying to setup solrcloud on ubuntu and somehow the graph on the admin
interface does not show up. It is simply blanck. The tree is available.

This is a test installation on one machine.

There are 3 zookeepers running.

I start two solr nodes like this:

solr-5.2.1$ bin/solr start -cloud -s server/solr1 -p 8983 -z
zk1:2181,zk1:2182,zk1:2183 -noprompt

solr-5.2.1$ bin/solr start -cloud -s server/solr2 -p 8984 -z
zk1:2181,zk1:2182,zk1:2183 -noprompt

zk1 is a local interface with 10.0.0.120

it all looks OK, no error messages.

Thank you in advance for any help on this


utf8 encoding for solr not working

2012-03-16 Thread Merlin Morgenstern
I am running solr 3.5 with a mysql data connector. Solr is configured to
use UTF8 as encoding:




unfortunatelly solr does encode special characters like "ä" into
htmlentities:

ä

which leads to problems when cutting strings with php mb_substr(..)

How can I configure solr to deliver UTF-8 instead of htmlentities?

Thank you for any help.


Re: Error while decoding %DC (Ü) from URL - results in ?

2011-08-28 Thread Merlin Morgenstern
I double checked all code on that page and it looks like everything is in
utf-8 and works just perfect. The problematic URLs are called always by bots
like google bot. Looks like they are operating with a different encoding.
The page itself has an utf-8 meta tag.

So it looks like I have to find a way that checks for the encoding and
encodes apropriatly. this should be a common solr problem if all search
engines treat utf-8 that way, right?

Any ideas how to fix that? Is there maybe a special solr functionality for
this?

2011/8/27 François Schiettecatte 

> Merlin
>
> Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so
> it looks like there is a charset mismatch somewhere.
>
>
> Cheers
>
> François
>
>
>
> On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote:
>
> > Hello,
> >
> > I am having problems with searches that are issued from spiders that
> contain
> > the ASCII encoded character "ü"
> >
> > For example in : "Übersetzung"
> >
> > The solr log shows following query request: /suche/%DCbersetzung
> > which has been translated into solr query: q=?ersetzung
> >
> > If you enter the search term directly as a user into the search box it
> will
> > result into:
> > /suche/Übersetzung which returns perfect results.
> >
> > I am decoding the URL within PHP: $term = trim(urldecode($q));
> >
> > Somehow urldecode() translates the Character Ü (%DC) into a ? which is a
> > illigeal first character in Solr.
> >
> > I tried it without urldecode(), with rawurldecode() and with
> utf8_decode()
> > but all of those did not help.
> >
> > Thank you for any help or hint on how to solve that problem.
> >
> > Regards, Merlin
>
>


Error while decoding %DC (Ü) from URL - results in ?

2011-08-27 Thread Merlin Morgenstern
Hello,

I am having problems with searches that are issued from spiders that contain
the ASCII encoded character "ü"

For example in : "Übersetzung"

The solr log shows following query request: /suche/%DCbersetzung
which has been translated into solr query: q=?ersetzung

If you enter the search term directly as a user into the search box it will
result into:
/suche/Übersetzung which returns perfect results.

I am decoding the URL within PHP: $term = trim(urldecode($q));

Somehow urldecode() translates the Character Ü (%DC) into a ? which is a
illigeal first character in Solr.

I tried it without urldecode(), with rawurldecode() and with utf8_decode()
but all of those did not help.

Thank you for any help or hint on how to solve that problem.

Regards, Merlin


Re: strip html from data

2011-08-15 Thread Merlin Morgenstern
2011/8/11 Ahmet Arslan 

> > Is there a way to strip the html tags completly and not
> > index them? If not,
> > how to I retrieve the results without html tags?
>
> How do you push documents to solr? You need to strip html tags before the
> analysis chain. For example, if you are using Data Import Handler, you can
> use HTMLStripTransformer.
>
>  http://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer
>

Thank you everybody for your help and all the detailed explanations. This
solution fixed the problem.

Best regards.


Re: strip html from data

2011-08-11 Thread Merlin Morgenstern
I am sorry, but I do not really understand the difference of indexed and
returned result set.

I look on the "returned" dataset via this command:
solr/select/?q=id:533563&terms=true

which gives me html tags like this ones: 

I also tried to turn on TermsComponent, but it did not change anything:
solr/select/?q=id:533563&terms=true

The shema browser does not show any html tags inside the text field, just
indexed words of the one dataset.

Is there a way to strip the html tags completly and not index them? If not,
how to I retrieve the results without html tags?

Thank you for your help.



2011/8/9 Erick Erickson 

> OK, what does "not working" mean? You never answered Markus' question:
>
> "Are you looking at the returned result set or what you've actually
> indexed?
> Analyzers are not run on the stored data, only on indexed data."
>
> If "not working" means that your returned results contain the markup, then
> you're confusing indexing and storing. All the analysis chains operate
> on data sent into the indexing process. But the verbatim data is *stored*
> prior to (or separate from) indexing.
>
> So my assumption is that you see data returned in the document with
> markup, which is just as it should be, and there's no problem at all. And
> your
> actual indexed terms (try looking at the data with TermsComponent, or
> admin/schema browser) will NOT have any markup.
>
> Perhaps you can back up a bit and describe what's failing .vs. what you
> expect.
>
> Best
> Erick
>
> On Mon, Aug 8, 2011 at 6:50 AM, Merlin Morgenstern
>  wrote:
> > Unfortunatelly I still cant get it running. The code I am using is the
> > following:
> >
> >
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> >
> >
> >
> >
> >
> >
> >
> > > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> >
> >
> >
> >
> >
> > I also tried this one:
> >
> >
> >  > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >   
> >
> >  
> >  
> >
> > 
> >
> >   > required="false"/>
> >
> > none of those worked. I restartred solr after the shema update and
> reindexed
> > the data. No change, the html tags are still in there.
> >
> > Any other ideas? Maybe this is a bug in solr? I am using solr 3.3.0 on
> suse
> > linux.
> >
> > Thank you for any help on this.
> >
> >
> >
> > 2011/7/25 Mike Sokolov 
> >
> >> Hmm that looks like it's working fine.  I stand corrected.
> >>
> >>
> >>
> >> On 07/25/2011 12:24 PM, Markus Jelsma wrote:
> >>
> >>> I've seen that issue too and read comments on the list yet i've never
> had
> >>> trouble with the order, don't know what's going on. Check this
> analyzer,
> >>> i've
> >>> moved the charFilter to the bottom:
> >>>
> >>> 
> >>> 
> >>>  >>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> >>> catenateAll="0"
> >>> splitOnCaseChange="1"/>
> >>> 
> >>>  >>> ignoreCase="false" expand="true"/>
> >>>  >>> words="stopwords.txt"/>
> >>> 
> >>>  >>> protected="protwords.txt"
> >>> language="Dutch"/>
> >>> 
> >>> 
> >>> 
> >>>
> >>> The analysis chain still does its job as i expect for the input:
> >>> bla bla
> >>>
> >>> Index Analyzer
> >>> org.apache.solr.analysis.**HTMLStripCharFilterFactory
> >>> {luceneMatchVersion=LUCENE_34}
> >>> textbla bla
> >>> org.apache.solr

Re: strip html from data

2011-08-08 Thread Merlin Morgenstern
Unfortunatelly I still cant get it running. The code I am using is the
following:

















I also tried this one:


 
   

  
  

 

  

none of those worked. I restartred solr after the shema update and reindexed
the data. No change, the html tags are still in there.

Any other ideas? Maybe this is a bug in solr? I am using solr 3.3.0 on suse
linux.

Thank you for any help on this.



2011/7/25 Mike Sokolov 

> Hmm that looks like it's working fine.  I stand corrected.
>
>
>
> On 07/25/2011 12:24 PM, Markus Jelsma wrote:
>
>> I've seen that issue too and read comments on the list yet i've never had
>> trouble with the order, don't know what's going on. Check this analyzer,
>> i've
>> moved the charFilter to the bottom:
>>
>> 
>> 
>> > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>> catenateAll="0"
>> splitOnCaseChange="1"/>
>> 
>> > ignoreCase="false" expand="true"/>
>> > words="stopwords.txt"/>
>> 
>> > protected="protwords.txt"
>> language="Dutch"/>
>> 
>> 
>> 
>>
>> The analysis chain still does its job as i expect for the input:
>> bla bla
>>
>> Index Analyzer
>> org.apache.solr.analysis.**HTMLStripCharFilterFactory
>> {luceneMatchVersion=LUCENE_34}
>> textbla bla
>> org.apache.solr.analysis.**WhitespaceTokenizerFactory
>> {luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> startOffset 6   10
>> endOffset   9   13
>> org.apache.solr.analysis.**WordDelimiterFilterFactory
>> {splitOnCaseChange=1,
>> generateNumberParts=1, catenateWords=1, luceneMatchVersion=LUCENE_34,
>> generateWordParts=1, catenateAll=0, catenateNumbers=1}
>> position1   2
>> term text   bla bla
>> startOffset 6   10
>> endOffset   9   13
>> typewordword
>> org.apache.solr.analysis.**LowerCaseFilterFactory
>> {luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> startOffset 6   10
>> endOffset   9   13
>> typewordword
>> org.apache.solr.analysis.**SynonymFilterFactory {synonyms=synonyms.txt,
>> expand=true, ignoreCase=false, luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> typewordword
>> startOffset 6   10
>> endOffset   9   13
>> org.apache.solr.analysis.**StopFilterFactory {words=stopwords.txt,
>> ignoreCase=false, luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> typewordword
>> startOffset 6   10
>> endOffset   9   13
>> org.apache.solr.analysis.**ASCIIFoldingFilterFactory
>> {luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> typewordword
>> startOffset 6   10
>> endOffset   9   13
>> org.apache.solr.analysis.**SnowballPorterFilterFactory
>> {protected=protwords.txt,
>> language=Dutch, luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> keyword false   false
>> typewordword
>> startOffset 6   10
>> endOffset   9   13
>> org.apache.solr.analysis.**RemoveDuplicatesTokenFilterFac**tory
>> {luceneMatchVersion=LUCENE_34}
>> position1   2
>> term text   bla bla
>> keyword false   false
>> typewordword
>> startOffset     6   10
>> endOffset   9   13
>>
>>
>> On Monday 25 July 2011 18:07:29 Mike Sokolov wrote:
>>
>>
>>> Hmm - I'm not sure about that; see
>>> https://issues.apache.org/**jira/browse/SOLR-2119<https://issues.apache.org/jira/browse/SOLR-2119>
>>>
>>> On 07/25/2011 12:01 PM, Markus Jelsma wrote:
>>>
>>>
>>>> charFilters are executed first regardless of their position in the
>>>> analyzer.
>>>>

Re: strip html from data

2011-07-25 Thread Merlin Morgenstern
sounds logical. I just changed it to the following, restarted and reindexed
with commit:

 
















 

Unfortunatelly that did not fix the error. There are still  tags inside
the data. Although I believe there are viewer then before but I can not
prove that. Fact is, there are still html tags inside the data.

Any other ideas what the problem could be?





2011/7/25 Markus Jelsma 

> You've three analyzer elements, i wonder what that would do. You need to
> add
> the char filter to the index-time analyzer.
>
> On Monday 25 July 2011 13:09:14 Merlin Morgenstern wrote:
> > Hi there,
> >
> > I am trying to strip html tags from the data before adding the documents
> to
> > the index. To do that I altered schem.xml like this:
> >
> >   > positionIncrementGap="100" autoGeneratePhraseQueries="true">
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >  
> > 
> >  
> >
> > 
> >  > required="false"/>
> > 
> >
> > Unfortunatelly this does not work, the hmtl tags like  are still
> > present after restarting and reindexing. I also tryed
> > htmlstriptransformer, but this did not work either.
> >
> > Has anybody an idea how to get this done? Thank you in advance for any
> > hint.
> >
> > Merlin
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>


copyField destination does not exist

2011-03-28 Thread Merlin Morgenstern
Hi there,

I am trying to get solr indexing mysql tables. Seems like I have
misconfigured schema.xml:

HTTP ERROR: 500

Severe errors in solr configuration.

-
org.apache.solr.common.SolrException: copyField destination :'text' does
not exist
at

org.apache.solr.schema.IndexSchema.registerCopyField(IndexSchema.java:685)


My config looks like this:

 



 

 id
 
 phrase


What is wrong within this config? The type schould be OK.

-- 
http://www.fastmail.fm - Choose from over 50 domains or use your own



Cant retrieve data

2011-03-28 Thread Merlin Morgenstern
Hi there,

I am new to solr and have just installed it on a suse box with mysql
backend.

Install and MySQL connector seem to be running. I can see the solr admin
interface.
Now I tried to index a table with about 0.5 Mio rows. That seemed to
work as well. However, I do get 0 results doing a querie on it.
Something seemes to be wrong. I also did a commit of the full import.

Here is the response from import.


−

0
3

−

−

data-config.xml


full-import
idle

−

1
404575
0
2011-03-28 12:47:36
−

Indexing completed. Added/Updated: 0 documents. Deleted 0 documents.

2011-03-28 12:47:42
2011-03-28 12:47:42
0
0:0:6.141

−

This response format is experimental.  It is likely to change in the
future.



Data-config.xml looks like this:












Thank you for any hint to get this running.

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free