Re: Information for solr-user@lucene.apache.org
mIKQ== X-Gm-Message-State: AOPr4FXspV1PleDBfvZ+pDSRP4LfZeqSLUtFJ/QJOqy34MGOamvKxEdZqoZ2ZrDEQowYDA== X-Received: by 10.37.18.84 with SMTP id 81mr4543780ybs.117.1463836110397; Sat, 21 May 2016 06:08:30 -0700 (PDT) Received: from [192.168.1.95] (c-24-99-150-103.hsd1.ga.comcast.net. [24.99.150.103]) by smtp.googlemail.com with ESMTPSA id l135sm2118090ywe.27.2016.05.21.06.08.29 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 21 May 2016 06:08:29 -0700 (PDT) Subject: Re: How to properly query indexed Data To: solr-user-i...@lucene.apache.org, solr-user-ow...@lucene.apache.org References: <573d0f03.7070...@gmail.com> <57405d74.6010...@gmail.com> From: Carl Roberts Message-ID: <57405dcd.1010...@gmail.com> Date: Sat, 21 May 2016 09:08:29 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.7.2 MIME-Version: 1.0 In-Reply-To: <57405d74.6010...@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit
Re: Information for solr-user@lucene.apache.org
Let's try this one (solr-user-digest-subscr...@lucene.apache.org) - maybe a real person will answer there. On 5/21/16 9:09 AM, Carl Roberts wrote: And, these responses are just wierd. Do they mean this user list is obsolete? Is solr no longer supported via a user list where we can ask questions? On 5/21/16 9:08 AM, solr-user-h...@lucene.apache.org wrote: Hi! This is the ezmlm program. I'm managing the solr-user@lucene.apache.org mailing list. I'm working for my owner, who can be reached at solr-user-ow...@lucene.apache.org. No information has been provided for this list. --- Administrative commands for the solr-user list --- I can handle administrative requests automatically. Please do not send them to the list address! Instead, send your message to the correct command address: To subscribe to the list, send a message to: To remove your address from the list, send a message to: Send mail to the following for info and FAQ for this list: Similar addresses exist for the digest list: To get messages 123 through 145 (a maximum of 100 per request), mail: To get an index with subject and author for messages 123-456 , mail: They are always returned as sets of 100, max 2000 per request, so you'll actually get 100-499. To receive all messages with the same subject as message 12345, send a short message to: The messages should contain one line or word of text to avoid being treated as sp@m, but I will ignore their content. Only the ADDRESS you send to is important. You can start a subscription for an alternate address, for example "john@host.domain", just add a hyphen and your address (with '=' instead of '@') after the command word: To stop subscription for this address, mail: In both cases, I'll send a confirmation message to that address. When you receive it, simply reply to it to complete your subscription. If despite following these instructions, you do not get the desired results, please contact my owner at solr-user-ow...@lucene.apache.org. Please be patient, my owner is a lot slower than I am ;-) --- Enclosed is a copy of the request I received. Return-Path: Received: (qmail 68327 invoked by uid 99); 21 May 2016 13:08:32 - Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 May 2016 13:08:32 + Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5FE77C223F; Sat, 21 May 2016 13:08:32 + (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.12 X-Spam-Level: X-Spam-Status: No, score=-0.12 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id f8ZdhY4qqJS5; Sat, 21 May 2016 13:08:31 + (UTC) Received: from mail-yw0-f178.google.com (mail-yw0-f178.google.com [209.85.161.178]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id F1DE85F343; Sat, 21 May 2016 13:08:30 + (UTC) Received: by mail-yw0-f178.google.com with SMTP id x194so133283653ywd.0; Sat, 21 May 2016 06:08:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=8E8fo6cUZgiwLyigoru6t7Y1eGoLyyrrNsxsrxC7z2Q=; b=Y5in/IeBqx6TN06Rl3dtaYVdW3jqzWm20vW46k+NUr2z2AL9+3qQPx4hgogkk4xfSk RJgHne+Rki1S4qU6+D5pD9QJePw5B+fm9A5bl7oMozl5UFmwWOAAUd41dbX9LBYumo/2 5mIPMvJFS+Ru+1kt9Qd4f8aiIf1jmk8fiHLwLNQ9JKhx7y0vZd2N/Q6+f40jCyi1rFro oEHes13s00riXATsdDV5dIkPd1OVSNC3J+C2Moe5Nn4X9kOsbgul3LpdYiiuIQYcqcuw xz9LFKOwJ3EbCq0GKF+sk4i5UUaZbX7AInS2gHMrdNHkxktBsoU8tr36X05KQ8CA2I89 Sh4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=8E8fo6cUZgiwLyigoru6t7Y1eGoLyyrrNsxsrxC7z2Q=; b=IFIJieLyb6JCKlIAVy7jZONu18DULolIzqbhaT89wmvTW6KyOVkoYYelnd9PKSOAhK 3R4i5CSKk+Cw0T6YkhF8P1SF/jt9zoRaniuFY1qrxbJOX7M8b7RmAdsTfsq4cZhR7mMn IhWP3Z7yyhgo2RgayUL0z7uQw/jb4ovyoZK/Ryb1Ny7rodkuXcD/kcQPFsEGquF3IEqk BzZrddn7bk0IhmPwurFgsmE2ygXdgz8yaFx71EizW0+rQ73s9lhFkTVguKKAg4AWEKhF QVYqiMHg4ogiZm7jf/6HlnGaWNJvdncO8vbl1vumSTTnz/V0GfTkOOd5h5H6nauJbX6b
How to perform a contains query
Hi, Sorry to ask this question again but I had some recent issues with SPAM filtering so I don't know if someone responded before or not. Basically I am looking for a way to query a field for a substring in it with functionality similar to what Java String.contains would return: So for example, If I have these 2 summary field values: "summary": "Apache Tomcat 7.x before 7.0.10 does not follow : > ServletSecurity annotations, which allows remote attackers to bypass : > intended access : > restrictions via HTTP requests to a web application.", : > "summary": "Apache Tomcat 7.0.0 through 7.0.6 and 6.0.0 through : > 6.0.30 does not enforce the maxHttpHeaderSize limit for requests involving : > the NIO HTTP connector, which allows remote attackers to cause a denial of : > service (OutOfMemoryError) via a crafted request.", I want to be able to provide a query that states, give me all records that have a field summary that contain "Apache Tomcat 7" so that both fields above are returned. Is there a way to do that? Regards, Joe On 5/23/16 12:37 PM, Chris Hostetter wrote: The mailing list you are looking for is "solr-user@lucene.apache.org" solr-user-info is an automated bot for giving you info about the list solr-user-owner is for contacting the human moderators of the mailing list with help : Date: Sat, 21 May 2016 09:07:00 -0400 : From: Carl Roberts : To: solr-user-i...@lucene.apache.org, solr-user-ow...@lucene.apache.org : Subject: Re: How to properly query indexed Data : : What is a reasonable time to expect an answer to questions in this user list? : : On 5/18/16 8:55 PM, Carl Roberts wrote: : > Hi, : > : > I am using Solar 4.10.3 and I the default URL for the GUI that comes with : > Solar: : > : > This is the URL: http://10.1.161.23:8983/solr/#/nvd-rss/query : > : > I have the following entries with field summary that are indexed. : > : > If I search for summary:"Apache Tomcat 7" I only get 10 results and the ones : > with Apache Tomcat 7.0.0 in the summary are missing from the results. : > : > If I search for summary:"Apache Tomcat 7.0.0" I only get the 3 with the : > "Apache Tomcat 7.0.0 in the summary. : > : > How do I get all of them? What filter should I use? I guess I am looking : > for a filter that says this: : > : > Give me all Entries that start with "Apache Tomcat 7" where 7 can be : > followed by 0.0 as in 7.0.0 or it can be followed by x as in 7.x or anything : > else. : > : > How do I do that? : > : > ~/dev/temp$ grep "Apache Tomcat 7" apache-tomcat-query.txt : > : > "summary": "Apache Tomcat 7.x before 7.0.10 does not follow : > ServletSecurity annotations, which allows remote attackers to bypass : > intended access : > restrictions via HTTP requests to a web application.", : > : > "summary": "Apache Tomcat 7.0.0 through 7.0.6 and 6.0.0 through : > 6.0.30 does not enforce the maxHttpHeaderSize limit for requests involving : > the NIO HTTP connector, which allows remote attackers to cause a denial of : > service (OutOfMemoryError) via a crafted request.", : > : > "summary": "org/apache/catalina/core/DefaultInstanceManager.java in : > Apache Tomcat 7.x before 7.0.22 does not properly restrict ContainerServlets : > in the Manager application, which allows local users to gain privileges by : > using an untrusted web application to access the Manager application's : > functionality.", : > : > "summary": "Unrestricted file upload vulnerability in Apache Tomcat : > 7.x before 7.0.40, in certain situations involving outdated java.io.File : > code and a custom JMX configuration, allows remote attackers to execute : > arbitrary code by uploading and accessing a JSP file.", : > : > "summary": "A certain tomcat7 package for Apache Tomcat 7 in Red Hat : > Enterprise Linux (RHEL) 7 allows remote attackers to cause a denial of : > service (CPU consumption) via a crafted request. NOTE: this vulnerability : > exists because of an unspecified regression.", : > : > "summary": "Apache Tomcat 7.0.0 through 7.0.3, 6.0.x, and 5.5.x, : > when running within a SecurityManager, does not make the ServletContext : > attribute read-only, which allows local web applications to read or write : > files outside of the intended working directory, as demonstrated using a : > directory traversal attack.", : > : > "summary": "Apache Tomcat 7.0.11, when web.xml has no login : > configuration, does not follow security constraints, which allows remote : > a
Need Help with custom ZIPURLDataSource class
Hi, I created a custom ZIPURLDataSource class to unzip the content from an http URL for an XML ZIP file and it seems to be working (at least I have no errors), but no data is imported. Here is my configuration in rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="DateFormatTransformer"> Attached is the ZIPURLDataSource.java file. It actually unzips and saves the raw XML to disk, which I have verified to be a valid XML file. The file has one or more entries (here is an example): http://scap.nist.gov/schema/scap-core/0.1"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:patch="http://scap.nist.gov/schema/patch/0.1"; xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"; xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"; xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"; xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"; pub_date="2015-01-10T05:37:05" xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1 http://nvd.nist.gov/schema/patch_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/feed/vulnerability/2.0 http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0"> http://nvd.nist.gov/";> cpe:/o:freebsd:freebsd:2.2.8 cpe:/o:freebsd:freebsd:1.1.5.1 cpe:/o:freebsd:freebsd:2.2.3 cpe:/o:freebsd:freebsd:2.2.2 cpe:/o:freebsd:freebsd:2.2.5 cpe:/o:freebsd:freebsd:2.2.4 cpe:/o:freebsd:freebsd:2.0.5 cpe:/o:freebsd:freebsd:2.2.6 cpe:/o:freebsd:freebsd:2.1.6.1 cpe:/o:freebsd:freebsd:2.0.1 cpe:/o:freebsd:freebsd:2.2 cpe:/o:freebsd:freebsd:2.0 cpe:/o:openbsd:openbsd:2.3 cpe:/o:freebsd:freebsd:3.0 cpe:/o:freebsd:freebsd:1.1 cpe:/o:freebsd:freebsd:2.1.6 cpe:/o:openbsd:openbsd:2.4 cpe:/o:bsdi:bsd_os:3.1 cpe:/o:freebsd:freebsd:1.0 cpe:/o:freebsd:freebsd:2.1.7 cpe:/o:freebsd:freebsd:1.2 cpe:/o:freebsd:freebsd:2.1.5 cpe:/o:freebsd:freebsd:2.1.7.1 CVE-1999-0001 1999-12-30T00:00:00.000-05:00 2010-12-16T00:00:00.000-05:00 5.0 NETWORK LOW NONE NONE NONE PARTIAL http://nvd.nist.gov 2004-01-01T00:00:00.000-05:00 OSVDB http://www.osvdb.org/5707"; xml:lang="en">5707 CONFIRM http://www.openbsd.org/errata23.html#tcpfix"; xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets. Here is the curl command: curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import And here is the output from the console for Jetty: main{StandardDirectoryReader(segments_1:1:nrt)} 2407 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.CoreContainer registering core: nvd-rss 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter user.dir=/Users/carlroberts/dev/solr-4.10.3/example 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter SolrDispatchFilter.init() done 2431 [main] INFO org.eclipse.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:8983 2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore [nvd-rss] webapp=null path=null params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false} hits=0 status=0 QTime=43 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore QuerySenderListener done. 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index for spellchecker: default 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index for spellchecker: wordbreak 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SuggestComponent Loading suggester index for: mySuggester 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.spelling.suggest.SolrSuggester reload() 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.spelling.suggest.SolrSuggester build() 2459 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore [nvd-rss] Registered new searcher Searcher@df9e84e[nvd-rss] main{StandardDirectoryReader(segments_1:1:nrt)} 8371 [qtp1640586218-17] INFO org.apache.solr.handler.dataimport.DataImporter Loading DIH Configuration: rss-data-config.xml 8379 [qtp1640586218-17] INFO org.apache.solr.handler.dataimport.DataImporter Data Configuration loaded successfully 8383 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter Starting Full Import 8384 [qtp1640586218-17] INFO org.apache.solr.core.SolrCore [nvd-rss] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=15 8396 [Thread-15] INFO org.apache.solr.handler.dataimport.SimplePropertiesWriter Read dataimport.properties 23431 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommi
Need help importing data
Hi, I have set log4j logging to level DEBUG and I have also modified the code to see what is being imported and I can see the nextRow() records, and the import is successful, however I have no data. Can someone please help me figure this out? Here is the logging output: ow: r1={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2360, cve=CVE-2002-2360, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2360, cve=CVE-2002-2360, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPath
Re: Need Help with custom ZIPURLDataSource class
NVM - I have this working. The problem was this: pk="link" in rss-dat.config.xml but unique id not link in schema.xml - it is id. From rss-data-config.xml: url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="true" /> commonField="true" /> From schema.xml: * id *What really bothers me is that there were no errors output by Solr to indicate this type of misconfiguration error and all the messages that Solr gave indicated the import was successful. This lack of appropriate error reporting is a pain, especially for someone learning Solr. Switching pk="link" to pk="id" solved the problem and I was then able to import the data. On 1/23/15, 6:34 PM, Carl Roberts wrote: Hi, I created a custom ZIPURLDataSource class to unzip the content from an http URL for an XML ZIP file and it seems to be working (at least I have no errors), but no data is imported. Here is my configuration in rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="DateFormatTransformer"> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> Attached is the ZIPURLDataSource.java file. It actually unzips and saves the raw XML to disk, which I have verified to be a valid XML file. The file has one or more entries (here is an example): http://scap.nist.gov/schema/scap-core/0.1"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:patch="http://scap.nist.gov/schema/patch/0.1"; xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"; xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"; xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"; xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"; pub_date="2015-01-10T05:37:05" xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1 http://nvd.nist.gov/schema/patch_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/feed/vulnerability/2.0 http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0"> http://nvd.nist.gov/";> cpe:/o:freebsd:freebsd:2.2.8 cpe:/o:freebsd:freebsd:1.1.5.1 cpe:/o:freebsd:freebsd:2.2.3 cpe:/o:freebsd:freebsd:2.2.2 cpe:/o:freebsd:freebsd:2.2.5 cpe:/o:freebsd:freebsd:2.2.4 cpe:/o:freebsd:freebsd:2.0.5 cpe:/o:freebsd:freebsd:2.2.6 cpe:/o:freebsd:freebsd:2.1.6.1 cpe:/o:freebsd:freebsd:2.0.1 cpe:/o:freebsd:freebsd:2.2 cpe:/o:freebsd:freebsd:2.0 cpe:/o:openbsd:openbsd:2.3 cpe:/o:freebsd:freebsd:3.0 cpe:/o:freebsd:freebsd:1.1 cpe:/o:freebsd:freebsd:2.1.6 cpe:/o:openbsd:openbsd:2.4 cpe:/o:bsdi:bsd_os:3.1 cpe:/o:freebsd:freebsd:1.0 cpe:/o:freebsd:freebsd:2.1.7 cpe:/o:freebsd:freebsd:1.2 cpe:/o:freebsd:freebsd:2.1.5 cpe:/o:freebsd:freebsd:2.1.7.1 CVE-1999-0001 1999-12-30T00:00:00.000-05:00 2010-12-16T00:00:00.000-05:00 5.0 NETWORK LOW NONE NONE NONE PARTIAL http://nvd.nist.gov 2004-01-01T00:00:00.000-05:00 OSVDB http://www.osvdb.org/5707"; xml:lang="en">5707 CONFIRM http://www.openbsd.org/errata23.html#tcpfix"; xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets. Here is the curl command: curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import And here is the output from the console for Jetty: main{StandardDirectoryReader(segments_1:1:nrt)} 2407 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.CoreContainer registering core: nvd-rss 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter user.dir=/Users/carlroberts/dev/solr-4.10.3/example 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter SolrDispatchFilter.init() done 2431 [main] INFO org.eclipse.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:8983 2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore [nvd-rss] webapp=null path=null params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false} hits=0 status=0 QTime=43 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore QuerySenderListener done. 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index for spellchecker: default 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index
Re: Need help importing data
NVM I figured this out. The problem was this: pk="link" in rss-dat.config.xml but unique id not link in schema.xml - it is id. From rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="true" /> commonField="true" /> From schema.xml: * id *What really bothers me is that there were no errors output by Solr to indicate this type of misconfiguration error and all the messages that Solr gave indicated the import was successful. This lack of appropriate error reporting is a pain, especially for someone learning Solr. Switching pk="link" to pk="id" solved the problem and I was then able to import the data. On 1/23/15, 9:39 PM, Carl Roberts wrote: Hi, I have set log4j logging to level DEBUG and I have also modified the code to see what is being imported and I can see the nextRow() records, and the import is successful, however I have no data. Can someone please help me figure this out? Here is the logging output: ow: r1={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2353, cve=CVE-2002-2353, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2354, cve=CVE-2002-2354, cwe=CWE-20, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}} 2015-01-23 21:28:04,606- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2355, cve=CVE-2002-2355, cwe=CWE-255, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2356, cve=CVE-2002-2356, cwe=CWE-264, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2357, cve=CVE-2002-2357, cwe=CWE-119, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r3={{id=CVE-2002-2358, cve=CVE-2002-2358, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:221]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: URL={url} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:227]-org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow: r1={{id=CVE-2002-2359, cve=CVE-2002-2359, cwe=CWE-79, $forEach=/nvd/entry}} 2015-01-23 21:28:04,607- INFO-[Thread-15]-[XPathEntityProcessor.java:251
How do you parse the data in a field that is returned from a query?
Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], And my problem is that I need to strip the cpe:/o part and I also need to tokenize words using the (:) as a separator so that I can then search for "freebsd 1.1" or "openbsd 2.4" or just "freebsd". Thanks in advance. Joe
Re: How do you parse the data in a field that is returned from a query?
Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do whatever you want with it! But then at the end, you say you want to search for whatever you stripped off. So, that should be back in Solr again? Or are you asking something along these lines: 1. I have a multiValued field with the following sample content... (it does not matter to Solr where it comes from) 2. I wanted it returned as is, but I want to be able to find documents when somebody searches for X, Y, or Z 3. What would be the best analyzer chain to be able to do so? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 24 January 2015 at 15:04, Carl Roberts wrote: Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], And my problem is that I need to strip the cpe:/o part and I also need to tokenize words using the (:) as a separator so that I can then search for "freebsd 1.1" or "openbsd 2.4" or just "freebsd". Thanks in advance. Joe
Re: How do you parse the data in a field that is returned from a query?
Yes - I am using DIH and I am reading the info from an XML file using the URL datasource, and I want to strip the cpe:/o and tokenize the data by (:) during import so I can then search it as I've described. So, my question is this: Is there any built in logic via a transformer class that could do this? If not, how would you recommend I do this? Regards, Joe On 1/24/15, 3:38 PM, Jack Krupansky wrote: Or, maybe... he's using DIH and getting these values from an RDBMS database query and now wants to index them in Solr. Who knows! It might be simplest to transform the colons to spaces and use a normal text field. Although you could use a custom text field type that used a regex tokenizer which treated the colons as token separators. -- Jack Krupansky On Sat, Jan 24, 2015 at 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do whatever you want with it! But then at the end, you say you want to search for whatever you stripped off. So, that should be back in Solr again? Or are you asking something along these lines: 1. I have a multiValued field with the following sample content... (it does not matter to Solr where it comes from) 2. I wanted it returned as is, but I want to be able to find documents when somebody searches for X, Y, or Z 3. What would be the best analyzer chain to be able to do so? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 24 January 2015 at 15:04, Carl Roberts wrote: Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], And my problem is that I need to strip the cpe:/o part and I also need to tokenize words using the (:) as a separator so that I can then search for "freebsd 1.1" or "openbsd 2.4" or just "freebsd". Thanks in advance. Joe
Re: How do you parse the data in a field that is returned from a query?
Via this rss-data-config.xml file and a class that I wrote (attached) to download and XML file from a ZIP URL: readTimeout="3"/> https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="false" /> commonField="false" /> commonField="false" /> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" /> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="false" /> commonField="false" /> commonField="false" /> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" /> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> On 1/24/15, 3:45 PM, Jack Krupansky wrote: How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts wrote: Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do whatever you want with it! But then at the end, you say you want to search for whatever you stripped off. So, that should be back in Solr again? Or are you asking something along these lines: 1. I have a multiValued field with the following sample content... (it does not matter to Solr where it comes from) 2. I wanted it returned as is, but I want to be able to find documents when somebody searches for X, Y, or Z 3. What would be the best analyzer chain to be able to do so? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 24 January 2015 at 15:04, Carl Roberts wrote: Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], And my problem is that I need to strip the cpe:/o part and I also need to tokenize words using the (:) as a separator so that I can then search for "freebsd 1.1" or "openbsd 2.4" or just "freebsd". Thanks in advance. Joe package org.apache.solr.handler.dataimport; import java.util.zip.*; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import java.io.*; import java.net.URL; import java.net.URLConnection; import java.nio.charset.StandardCharsets; import java.util.Properties; import java.ut
Re: How do you parse the data in a field that is returned from a query?
The unzipped XML that I am reading looks like this: http://scap.nist.gov/schema/scap-core/0.1"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:patch="http://scap.nist.gov/schema/patch/0.1"; xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"; xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"; xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"; xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"; pub_date="2015-01-10T05:37:05" xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1 http://nvd.nist.gov/schema/patch_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/feed/vulnerability/2.0 http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0"> http://nvd.nist.gov/";> cpe:/o:freebsd:freebsd:2.2.8 cpe:/o:freebsd:freebsd:1.1.5.1 cpe:/o:freebsd:freebsd:2.2.3 cpe:/o:freebsd:freebsd:2.2.2 cpe:/o:freebsd:freebsd:2.2.5 cpe:/o:freebsd:freebsd:2.2.4 cpe:/o:freebsd:freebsd:2.0.5 cpe:/o:freebsd:freebsd:2.2.6 cpe:/o:freebsd:freebsd:2.1.6.1 cpe:/o:freebsd:freebsd:2.0.1 cpe:/o:freebsd:freebsd:2.2 cpe:/o:freebsd:freebsd:2.0 cpe:/o:openbsd:openbsd:2.3 cpe:/o:freebsd:freebsd:3.0 cpe:/o:freebsd:freebsd:1.1 cpe:/o:freebsd:freebsd:2.1.6 cpe:/o:openbsd:openbsd:2.4 cpe:/o:bsdi:bsd_os:3.1 cpe:/o:freebsd:freebsd:1.0 cpe:/o:freebsd:freebsd:2.1.7 cpe:/o:freebsd:freebsd:1.2 cpe:/o:freebsd:freebsd:2.1.5 cpe:/o:freebsd:freebsd:2.1.7.1 CVE-1999-0001 1999-12-30T00:00:00.000-05:00 2010-12-16T00:00:00.000-05:00 5.0 NETWORK LOW NONE NONE NONE PARTIAL http://nvd.nist.gov 2004-01-01T00:00:00.000-05:00 OSVDB http://www.osvdb.org/5707"; xml:lang="en">5707 CONFIRM href="http://www.openbsd.org/errata23.html#tcpfix"; xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets. On 1/24/15, 3:49 PM, Carl Roberts wrote: Via this rss-data-config.xml file and a class that I wrote (attached) to download and XML file from a ZIP URL: readTimeout="3"/> https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="false" /> commonField="false" /> commonField="false" /> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" /> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> commonField="false" /> commonField="false" /> commonField="false" /> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false" /> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" /> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> On 1/24/15, 3:45 PM, Jack Krupansky wrote: How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts wrote: Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do whatever you want with it!
Re: How do you parse the data in a field that is returned from a query?
Thanks Jack. On 1/24/15, 3:57 PM, Jack Krupansky wrote: Take a look at the RegexTransformer. Or,in some cases your may need to use the raw ScriptTransformer. See: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler -- Jack Krupansky On Sat, Jan 24, 2015 at 3:49 PM, Carl Roberts wrote: Via this rss-data-config.xml file and a class that I wrote (attached) to download and XML file from a ZIP URL: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> http://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2003.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry"> On 1/24/15, 3:45 PM, Jack Krupansky wrote: How are you currently importing data? -- Jack Krupansky On Sat, Jan 24, 2015 at 3:42 PM, Carl Roberts < carl.roberts.zap...@gmail.com wrote: Sorry if I was not clear. What I am asking is this: How can I parse the data during import to tokenize it by (:) and strip the cpe:/o? On 1/24/15, 3:28 PM, Alexandre Rafalovitch wrote: You are using keywords here that seem to contradict with each other. Or your use case is not clear. Specifically, you are saying you are getting stuff from a (Solr?) query. So, the results are now outside of Solr. Then you are asking for help to strip stuff off it. Well, it's outside of Solr, do whatever you want with it! But then at the end, you say you want to search for whatever you stripped off. So, that should be back in Solr again? Or are you asking something along these lines: 1. I have a multiValued field with the following sample content... (it does not matter to Solr where it comes from) 2. I wanted it returned as is, but I want to be able to find documents when somebody searches for X, Y, or Z 3. What would be the best analyzer chain to be able to do so? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 24 January 2015 at 15:04, Carl Roberts < carl.roberts.zap...@gmail.com> wrote: Hi, How can I parse the data in a field that is returned from a query? Basically, I have a multi-valued field that contains values such as these that are returned from a query: "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], And my problem is that I need to strip the cpe:/o part and I also need to tokenize words using the (:) as a separator so that I can then search for "freebsd 1.1" or "openbsd 2.4" or just "freebsd". Thanks in advance. Joe
What is the recommended way to import and update index records?
Hi, What is the recommended way to import and update index records? I've read the documentation and I've experimented with full-import and delta-import and I am not seeing the desired results. Basically, I have 15 RSS feeds that I am importing through rss-data-config.xml. The first RSS feed should be a full import and the ones that follow may contain the same id, in which case the existing id in the index should be updated from the record in the new RSS feed. Also there may be new records in the RSS feeds that follow the first one, in which case I want them added to the index. When I try full-import for each entity, the index is cleared and I just end up with the records for the last import. When I try full-import for each entity, with the clean=false parameter, all the records from each entity are added to the index and I end up with duplicate records. When I try delta-import for the entities the follow the first one, I don't get any new index records. How should I do this? Regards, Joe
Re: What is the recommended way to import and update index records?
Also, if I try full-import and clean=false with the same XML file, I end up with more records each time the import runs. How can I make SOLR just add the records that are new by id, and update the ones that have an id that matches the one in the existing index? On 1/27/15, 11:32 AM, Carl Roberts wrote: Hi, What is the recommended way to import and update index records? I've read the documentation and I've experimented with full-import and delta-import and I am not seeing the desired results. Basically, I have 15 RSS feeds that I am importing through rss-data-config.xml. The first RSS feed should be a full import and the ones that follow may contain the same id, in which case the existing id in the index should be updated from the record in the new RSS feed. Also there may be new records in the RSS feeds that follow the first one, in which case I want them added to the index. When I try full-import for each entity, the index is cleared and I just end up with the records for the last import. When I try full-import for each entity, with the clean=false parameter, all the records from each entity are added to the index and I end up with duplicate records. When I try delta-import for the entities the follow the first one, I don't get any new index records. How should I do this? Regards, Joe
Re: Is there a way to pass in proxy settings to Solr?
Hi Shawn, I got it to work by using this script to start my instance of Solr: java *-Dhttp.proxyHost=http-proxy-server -Dhttp.http.proxyPort=80 -Dhttps.proxyHost=http-proxy-server -Dhttps.proxyPort=80* -Dlog4j.debug=true -Dlog4j.configuration=file:///Users/carlroberts/dev/solr-4.10.3/log4j.xml -Dsolr.solr.home=../ -classpath "./:lib/*:./log4j.xml" -jar start.jar Regards, Joe On 1/22/15, 11:46 AM, Shawn Heisey wrote: On 1/22/2015 9:18 AM, Carl Roberts wrote: Is there a way to pass in proxy settings to Solr? The reason that I am asking this question is that I am trying to run the DIH RSS example, and it is not working when I try to import the RSS feed URL because the code in Solr comes back with an unknown host exception due to the proxy that we use at work. If I use the curl tool and the environment variable http_proxy to access the RSS feed directly it works, but it appears Solr does not use that environment variable because it is throwing this error: 39642 [Thread-15] ERROR org.apache.solr.handler.dataimport.URLDataSource – Exception thrown while getting data Checking the code, URLDataSource seems to use the URL capability that comes with Java itself. The system properties on this page are very likely to affect objects that come with Java: http://docs.oracle.com/javase/7/docs/api/java/net/doc-files/net-properties.html#Proxies You would need to set these properties on the java commandline that starts your servlet container, with the -D option. Thanks, Shawn
Re: What is the recommended way to import and update index records?
HI Alex, thanks for clarifying this for me. I'll take a look at my setup of the uniqueKey. Perhaps I did not set it right. On 1/27/15, 12:09 PM, Alexandre Rafalovitch wrote: What do you mean by "update"? If you mean partial update, DIH does not do it AFAIK. If you mean replace, it should. If you are getting duplicate records, maybe your uniqueKey is not set correctly? clean=false looks to me like the right approach for incremental updates. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 27 January 2015 at 11:43, Carl Roberts wrote: Also, if I try full-import and clean=false with the same XML file, I end up with more records each time the import runs. How can I make SOLR just add the records that are new by id, and update the ones that have an id that matches the one in the existing index? On 1/27/15, 11:32 AM, Carl Roberts wrote: Hi, What is the recommended way to import and update index records? I've read the documentation and I've experimented with full-import and delta-import and I am not seeing the desired results. Basically, I have 15 RSS feeds that I am importing through rss-data-config.xml. The first RSS feed should be a full import and the ones that follow may contain the same id, in which case the existing id in the index should be updated from the record in the new RSS feed. Also there may be new records in the RSS feeds that follow the first one, in which case I want them added to the index. When I try full-import for each entity, the index is cleared and I just end up with the records for the last import. When I try full-import for each entity, with the clean=false parameter, all the records from each entity are added to the index and I end up with duplicate records. When I try delta-import for the entities the follow the first one, I don't get any new index records. How should I do this? Regards, Joe
Cannot reindex to add a new field
Hi, I have tried to reindex to add a new field named product-info and no matter what I do, I cannot get the new field to appear in the index after import via DIH. Here is the rss-data-config.xml configuration (field product-info is the new field I added): readTimeout="3"/> https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="RegexTransformer"> commonField="false" /> commonField="false" /> commonField="false" /> *xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/>* xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false"/> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> Here is the section that contains the new product-info field in schema.xml: multiValued="true"/> *stored="true" multiValued="true"/>* indexed="true" stored="true" multiValued="true"/> stored="true" multiValued="true"/> stored="true" /> stored="true" /> stored="true" /> Field product-info is defined in the same manner as vulnerable-software field and it pulls the same data as vulnerable-software list field via the same xpath, yet vulnerable-software field shows up in the results and product-info field does not. Here is the response for a query after the import takes place - field product-info is missing: ~/dev/solr-4.10.3$ curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=*:*&start=0&&rows=1"; { "responseHeader":{ "status":0, "QTime":1, "params":{ "indent":"true", "start":"0", "q":"*:*", "wt":"json", "rows":"1"}}, "response":{"numFound":6717,"start":0,"docs":[ { "id":"CVE-1999-0001", "summary":"ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets.", "vulnerable-configuration":["cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:2.1.7.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.0.1"], "cve":"CVE-1999-0001", "cwe":"CWE-20", "published":"1999-12-30T00:00:00.000-05:00", "vulnerable-software":["cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], "modified":"2010-12-16T00:00:00.000-05:00", "_version_":1491484873657942016}] }} This is what I have tried so far: Restarted Solr to reload the schema and reimported via full-import and clean=true. Invoked the following command to reload the schema and reimported via full-import and clean=true curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=nvd-rss"; I am not sure why this is not working as everything seems correct to me in my setup. Could this be a bug? Regards, Joe
Re: After adding field to schema, the field is not being returned in results.
I too am running into what appears to be the same thing. Everything works and data is imported but I cannot see the new field in the result.
Re: Cannot reindex to add a new field
Well - I got this to work. Noticed that when log4j is enabled product-info was in the import as product-info=[], so I then played with the field and got this definition to work in the rss-data-config.xml file: xpath="/nvd/entry/vulnerable-software-list/product" commonField="false" regex=":" replaceWith=" "/> Don't ask me why the other one didn't work, as I think it should have worked also. On 1/27/15, 3:42 PM, Carl Roberts wrote: Hi, I have tried to reindex to add a new field named product-info and no matter what I do, I cannot get the new field to appear in the index after import via DIH. Here is the rss-data-config.xml configuration (field product-info is the new field I added): readTimeout="3"/> url="https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="RegexTransformer"> commonField="false" /> commonField="false" /> commonField="false" /> *xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/>* xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false"/> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> Here is the section that contains the new product-info field in schema.xml: multiValued="true"/> *stored="true" multiValued="true"/>* indexed="true" stored="true" multiValued="true"/> indexed="true" stored="true" multiValued="true"/> stored="true" /> stored="true" /> stored="true" /> Field product-info is defined in the same manner as vulnerable-software field and it pulls the same data as vulnerable-software list field via the same xpath, yet vulnerable-software field shows up in the results and product-info field does not. Here is the response for a query after the import takes place - field product-info is missing: ~/dev/solr-4.10.3$ curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=*:*&start=0&&rows=1"; { "responseHeader":{ "status":0, "QTime":1, "params":{ "indent":"true", "start":"0", "q":"*:*", "wt":"json", "rows":"1"}}, "response":{"numFound":6717,"start":0,"docs":[ { "id":"CVE-1999-0001", "summary":"ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets.", "vulnerable-configuration":["cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:2.1.7.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.0.1"], "cve":"CVE-1999-0001", "cwe":"CWE-20", "published":"1999-12-30T00:00:00.000-05:00", "vulnerable-software":["cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3",
Re: Cannot reindex to add a new field
reebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], "modified":"2010-12-16T00:00:00.000-05:00", "_version_":1491493094540967936}] }} On 1/27/15, 5:19 PM, Alexandre Rafalovitch wrote: One xpath per field definition. You had two fields for the same xpath. If they were the same value, the best bet would be to deal with it via copyField in the schema. No idea why regex thing makes a difference, are you sure the other field is also still being indexed? Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 27 January 2015 at 17:11, Carl Roberts wrote: Well - I got this to work. Noticed that when log4j is enabled product-info was in the import as product-info=[], so I then played with the field and got this definition to work in the rss-data-config.xml file: Don't ask me why the other one didn't work, as I think it should have worked also. On 1/27/15, 3:42 PM, Carl Roberts wrote: Hi, I have tried to reindex to add a new field named product-info and no matter what I do, I cannot get the new field to appear in the index after import via DIH. Here is the rss-data-config.xml configuration (field product-info is the new field I added): https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="RegexTransformer"> ** Here is the section that contains the new product-info field in schema.xml: ** Field product-info is defined in the same manner as vulnerable-software field and it pulls the same data as vulnerable-software list field via the same xpath, yet vulnerable-software field shows up in the results and product-info field does not. Here is the response for a query after the import takes place - field product-info is missing: ~/dev/solr-4.10.3$ curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=*:*&start=0&&rows=1"; { "responseHeader":{ "status":0, "QTime":1, "params":{ "indent":"true", "start":"0", "q":"*:*", "wt":"json", "rows":"1"}}, "response":{"numFound":6717,"start":0,"docs":[ { "id":"CVE-1999-0001", "summary":"ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets.", "vulnerable-configuration":["cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:2.1.7.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.0.1"], "cve":"CVE-1999-0001", "cwe":"CWE-20", "published":"1999-12-30T00:00:00.000-05:00", "vulnerable-software":["cpe:/o:freebsd:freebsd:2.2.8", "cpe:/o:freebsd:freebsd:1.1.5.1", "cpe:/o:freebsd:freebsd:2.2.3", "cpe:/o:freebsd:freebsd:2.2.2", "cpe:/o:freebsd:freebsd:2.2.5", "cpe:/o:freebsd:freebsd:2.2.4", "cpe:/o:freebsd:freebsd:2.0.5", "cpe:/o:freebsd:freebsd:2.2.6", "cpe:/o:freebsd:freebsd:2.1.6.1", "cpe:/o:freebsd:freebsd:2.0.1", "cpe:/o:freebsd:freebsd:2.2", "cpe:/o:freebsd:freebsd:2.0", "cpe:/o:openbsd:openbsd:2.3", "cpe:/o:freebsd:freebsd:3.0", "cpe:/o:freebsd:freebsd:1.1", "cpe:/o:freebsd:freebsd:2.1.6", "cpe:/o:openbsd:openbsd:2.4", "cpe:/o:bsdi:bsd_os:3.1", "cpe:/o:freebsd:freebsd:1.0", "cpe:/o:freebsd:freebsd:2.1.7", "cpe:/o:freebsd:freebsd:1.2", "cpe:/o:freebsd:freebsd:2.1.5", "cpe:/o:freebsd:freebsd:2.1.7.1"], "modified":"2010-12-16T00:00:00.000-05:00", "_version_":1491484873657942016}] }} This is what I have tried so far: Restarted Solr to reload the schema and reimported via full-import and clean=true. Invoked the following command to reload the schema and reimported via full-import and clean=true curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=nvd-rss"; I am not sure why this is not working as everything seems correct to me in my setup. Could this be a bug? Regards, Joe
Re: What is the recommended way to import and update index records?
OK - I did a little testing and with full-import and clean=false, I get more and more records when I import the same XML file. I have also checked and I see that my uniqueKey is defined correctly. Here are my fields in schema.xml: multiValued="true"/> indexed="true" stored="true" multiValued="true"/> stored="true" multiValued="true"/> stored="true" multiValued="true"/> stored="true" /> stored="true" /> stored="true" /> stored="true" /> stored="true" /> indexed="true" stored="true" /> stored="true" /> indexed="true" stored="true" /> indexed="true" stored="true" /> indexed="true" stored="true" /> stored="true" multiValued="true"/> stored="true" /> And here is uniqueKey in schema.xml: id Here is my rss-data-config.xml: readTimeout="3"/> https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="RegexTransformer"> commonField="false" /> commonField="false" /> commonField="false" /> xpath="/nvd/entry/vulnerable-configuration/logical-test/fact-ref/@name" commonField="false"/> xpath="/nvd/entry/vulnerable-software-list/product" commonField="false"/> commonField="false" regex="cpe:/.:" replaceWith=""/> replaceWith=" "/> xpath="/nvd/entry/published-datetime" commonField="false" /> xpath="/nvd/entry/last-modified-datetime" commonField="false" /> commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/score" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/access-vector" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/access-complexity" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/authentication" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/confidentiality-impact" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/integrity-impact" commonField="false" /> xpath="/nvd/entry/cvss/base_metrics/availability-impact" commonField="false" /> xpath="/nvd/entry/references/reference/@href" commonField="false" /> xpath="/nvd/entry/security-protection" commonField="false" /> Here is the import command the first time: *curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&entity=cve-2002&clean=true"* Here is the command that outputs the count of records: *curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=*:*&start=0&&rows=0&fl=*"* And here is the output: { "responseHeader":{ "status":0, "QTime":0, "params":{ "fl":"*", "indent":"true", "start":"0", "q":"*:*", "wt":"json", "rows":"0"}}, "response":{"numFound":6717,"start":0,"docs":[] }} Now here is the next full-import command with clean=false: *"http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&entity=cve-2002&clean=false"* And here is the new count: *curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=*:*&start=0&&rows=0&fl=*"* { "responseHeader":{ "status":0, "QTime":0, "params":{ "fl":"*", "indent":"true", "start":"0", "q":"*:*", "wt":"json", "rows":"0"}}, "response":{"numFound":13434,"start":0,"docs":[] }} Clearly, this is just importing the same records twice. What is even more puzzling that if I search for an id value which is unique in the imported XML, I get all records back: curl "http://localhost:8983/solr/nvd-rss/select?wt
Re: What is the recommended way to import and update index records?
Yep - it works with string. Thanks a lot! On 1/27/15, 7:08 PM, Alexandre Rafalovitch wrote: Make that id field a string and reindex. text_general is not the right type for a unique key. Regards, Alex.
Running multiple full-import commands via curl in a script
Hi, I am attempting to run all these curl commands from a script so that I can put them in a crontab job, however, it seems that only the first one executes and the other ones return with an error (below): curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2002"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2003"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2004"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2005"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2006"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2007"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2008"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2009"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2010"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2011"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2012"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2013"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2014"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&entity=cve-2015"; curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=delta-import&clean=false&entity=cve-last"; error: *A command is still running...* Question: Is there a way to queue the other requests in Solr so that they run as soon as the previous one is done? If not, how would you recommend I do this? Many thanks in advance, Joe
Re: Running multiple full-import commands via curl in a script
Thanks Mikhail - synchronous=true works like a charm...:) On 1/28/15, 5:16 AM, Mikhail Khludnev wrote: Literally, queue can be done by submitting as is (async) and polling command status. However, giving https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L200 you can try to add &synchronous=true&... that should hang request until it's completed. The other question is how run requests in parallel which is explicitly violated by https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L173 The only workaround I can suggest is to duplicate DIH definitions in solr config ... ... ... ... then those guys should be able to handle own request in parallel. Nasty stuff.. have a good hack On Wed, Jan 28, 2015 at 3:47 AM, Carl Roberts wrote: Hi, I am attempting to run all these curl commands from a script so that I can put them in a crontab job, however, it seems that only the first one executes and the other ones return with an error (below): curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2002" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2003" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2004" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2005" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2006" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2007" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2008" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2009" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2010" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2011" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2012" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2013" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2014" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= full-import&clean=false&entity=cve-2015" curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command= delta-import&clean=false&entity=cve-last" error: *A command is still running...* Question: Is there a way to queue the other requests in Solr so that they run as soon as the previous one is done? If not, how would you recommend I do this? Many thanks in advance, Joe
What is the best way to update an index?
Hi, What is the best way to update an index with new data or records? Via this command: curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import&clean=false&synchronous=true&entity=cve-2002"; or this command: curl "http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=delta-import&synchronous=true&entity=cve-2002"; Thanks, Joe
How do I unsubscribe?
How do I unsubscribe?
Errors using the Embedded Solar Server
Hi, I have downloaded the code and documentation for Solr version 4.10.3. I am trying to follow SolrJ Wiki guide and I am running into errors. The latest error is this one: Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:39) My code is this: package solr; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.SolrCore; public class Test { public static void main(String [] args){ CoreContainer container = new CoreContainer("/Users/carlroberts/dev/solr-4.10.3"); System.out.println(container.getDefaultCoreName()); System.out.println(container.getSolrHome()); container.load(); System.out.println(container.isLoaded("db")); System.out.println(container.getCoreInitFailures()); Collection cores = container.getCores(); System.out.println(cores); EmbeddedSolrServer server = new EmbeddedSolrServer( container, "db" ); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "id", "id1", 1.0f ); doc1.addField( "name", "doc1", 1.0f ); doc1.addField( "price", 10 ); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField( "id", "id2", 1.0f ); doc2.addField( "name", "doc2", 1.0f ); doc2.addField( "price", 20 ); Collection docs = new ArrayList(); docs.add( doc1 ); docs.add( doc2 ); try{ server.add( docs ); server.commit(); server.deleteByQuery( "*:*" ); }catch(IOException e){ e.printStackTrace(); }catch(SolrServerException e){ e.printStackTrace(); } } } My solr.xml file is this: And my db/conf directory was copied from example/solr/collection/conf directory and it contains the solrconfig.xml file and schema.xml file. I have noticed that the documentation that shows how to use the EmbeddedSolarServer is outdated as it indicates I should use CoreContainer.Initializer class which doesn't exist, and container.load(path, file) which also doesn't exist. At this point I have no idea why I am getting the No such core error and I have googled it and there seems to be tons of threads showing this error but for different reasons, and I have tried all the suggested resolutions and get nowhere with this. Can you please help? Regards, Joe
Re: Errors using the Embedded Solar Server
So far I have not been able to get the logging to work - here is what I get in the console prior to the exception: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. db /Users/carlroberts/dev/solr-4.10.3/ false {} [] /Users/carlroberts/dev/solr-4.10.3/ On 1/21/15, 11:50 AM, Alan Woodward wrote: That certainly looks like it ought to work. Is there log output that you could show us as well? Alan Woodward www.flax.co.uk On 21 Jan 2015, at 16:09, Carl Roberts wrote: Hi, I have downloaded the code and documentation for Solr version 4.10.3. I am trying to follow SolrJ Wiki guide and I am running into errors. The latest error is this one: Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:39) My code is this: package solr; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.SolrCore; public class Test { public static void main(String [] args){ CoreContainer container = new CoreContainer("/Users/carlroberts/dev/solr-4.10.3"); System.out.println(container.getDefaultCoreName()); System.out.println(container.getSolrHome()); container.load(); System.out.println(container.isLoaded("db")); System.out.println(container.getCoreInitFailures()); Collection cores = container.getCores(); System.out.println(cores); EmbeddedSolrServer server = new EmbeddedSolrServer( container, "db" ); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "id", "id1", 1.0f ); doc1.addField( "name", "doc1", 1.0f ); doc1.addField( "price", 10 ); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField( "id", "id2", 1.0f ); doc2.addField( "name", "doc2", 1.0f ); doc2.addField( "price", 20 ); Collection docs = new ArrayList(); docs.add( doc1 ); docs.add( doc2 ); try{ server.add( docs ); server.commit(); server.deleteByQuery( "*:*" ); }catch(IOException e){ e.printStackTrace(); }catch(SolrServerException e){ e.printStackTrace(); } } } My solr.xml file is this: And my db/conf directory was copied from example/solr/collection/conf directory and it contains the solrconfig.xml file and schema.xml file. I have noticed that the documentation that shows how to use the EmbeddedSolarServer is outdated as it indicates I should use CoreContainer.Initializer class which doesn't exist, and container.load(path, file) which also doesn't exist. At this point I have no idea why I am getting the No such core error and I have googled it and there seems to be tons of threads showing this error but for different reasons, and I have tried all the suggested resolutions and get nowhere with this. Can you please help? Regards, Joe
Re: Errors using the Embedded Solar Server
Hi, Could there be a bug in the EmbeddedSolrServer that is causing this? Is it still supported in version 4.10.3? If it is, can someone please provide me assistance with this? Regards, Joe On 1/21/15, 12:18 PM, Carl Roberts wrote: I had to hardcode the path in solrconfig.xml from this: ${solr.install.dir:} to this: /Users/carlroberts/dev/solr-4.10.3/ to avoid the classloader warnings, but I still get the same error. I am not sure where the ${solr.install.dir:} value gets pulled from but apparently that is not working. Here is the new output: [main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/' [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/commons-logging-1.2.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/servlet-api.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/slf4j-simple-1.7.5.jar' to classloader [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /Users/carlroberts/dev/solr-4.10.3/solr.xml [main] INFO org.apache.solr.core.CoreContainer - New CoreContainer 1023143764 [main] INFO org.apache.solr.core.CoreContainer - Loading cores into CoreContainer [instanceDir=/Users/carlroberts/dev/solr-4.10.3/] db /Users/carlroberts/dev/solr-4.10.3/ [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting socketTimeout to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting urlScheme to: null [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting connTimeout to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxConnectionsPerHost to: 20 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting corePoolSize to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maximumPoolSize to: 2147483647 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxThreadIdleTime to: 5 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting sizeOfQueue to: -1 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting fairnessPolicy to: false [main] INFO org.apache.solr.update.UpdateShardHandler - Creating UpdateShardHandler HTTP client with params: socketTimeout=0&connTimeout=0&retry=false [main] INFO org.apache.solr.logging.LogWatcher - SLF4J impl is org.slf4j.impl.SimpleLoggerFactory [main] INFO org.apache.solr.logging.LogWatcher - No LogWatcher configured [main] INFO org.apache.solr.core.CoreContainer - Host Name: null [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/db/' [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrConfig - Adding specified lib dirs to ClassLoader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/aspectjrt-1.6.11.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/bcprov-jdk15-1.45.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/boilerpipe-1.1.0.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/commons-compress-1.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/dom4j-1.6.1.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/fontbox-1.8.4.jar' to classloader [coreLoadExecutor-5-t
Re: Errors using the Embedded Solar Server
ema.IndexSchema - Reading Solr Schema from /Users/carlroberts/dev/solr-4.10.3/db/conf/schema.xml [coreLoadExecutor-5-thread-1] INFO org.apache.solr.schema.IndexSchema - [db] Schema name=example false {} [] /Users/carlroberts/dev/solr-4.10.3/ Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:40) On 1/21/15, 11:50 AM, Alan Woodward wrote: That certainly looks like it ought to work. Is there log output that you could show us as well? Alan Woodward www.flax.co.uk On 21 Jan 2015, at 16:09, Carl Roberts wrote: Hi, I have downloaded the code and documentation for Solr version 4.10.3. I am trying to follow SolrJ Wiki guide and I am running into errors. The latest error is this one: Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:39) My code is this: package solr; import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.Collection; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.SolrCore; public class Test { public static void main(String [] args){ CoreContainer container = new CoreContainer("/Users/carlroberts/dev/solr-4.10.3"); System.out.println(container.getDefaultCoreName()); System.out.println(container.getSolrHome()); container.load(); System.out.println(container.isLoaded("db")); System.out.println(container.getCoreInitFailures()); Collection cores = container.getCores(); System.out.println(cores); EmbeddedSolrServer server = new EmbeddedSolrServer( container, "db" ); SolrInputDocument doc1 = new SolrInputDocument(); doc1.addField( "id", "id1", 1.0f ); doc1.addField( "name", "doc1", 1.0f ); doc1.addField( "price", 10 ); SolrInputDocument doc2 = new SolrInputDocument(); doc2.addField( "id", "id2", 1.0f ); doc2.addField( "name", "doc2", 1.0f ); doc2.addField( "price", 20 ); Collection docs = new ArrayList(); docs.add( doc1 ); docs.add( doc2 ); try{ server.add( docs ); server.commit(); server.deleteByQuery( "*:*" ); }catch(IOException e){ e.printStackTrace(); }catch(SolrServerException e){ e.printStackTrace(); } } } My solr.xml file is this: And my db/conf directory was copied from example/solr/collection/conf directory and it contains the solrconfig.xml file and schema.xml file. I have noticed that the documentation that shows how to use the EmbeddedSolarServer is outdated as it indicates I should use CoreContainer.Initializer class which doesn't exist, and container.load(path, file) which also doesn't exist. At this point I have no idea why I am getting the No such core error and I have googled it and there seems to be tons of threads showing this error but for different reasons, and I have tried all the suggested resolutions and get nowhere with this. Can you please help? Regards, Joe
Re: Errors using the Embedded Solar Server
b/hppc-0.5.2.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/jackson-core-asl-1.9.13.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/jackson-mapper-asl-1.9.13.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/mahout-collections-1.0.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/mahout-math-0.6.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/simple-xml-2.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-clustering-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/langid/lib/jsonic-1.2.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/langid/lib/langdetect-1.1-20120112.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-langid-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/commons-beanutils-1.8.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/commons-collections-3.2.1.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/velocity-1.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/velocity-tools-2.0.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-velocity-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.update.SolrIndexConfig - IndexWriter infoStream solr logging is enabled [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrConfig - Using Lucene MatchVersion: 4.10.3 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.Config - Loaded SolrConfig: solrconfig.xml [coreLoadExecutor-5-thread-1] INFO org.apache.solr.schema.IndexSchema - Reading Solr Schema from /Users/carlroberts/dev/solr-4.10.3/db/conf/schema.xml [coreLoadExecutor-5-thread-1] INFO org.apache.solr.schema.IndexSchema - [db] Schema name=example false {} [] /Users/carlroberts/dev/solr-4.10.3/ Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:40) On 1/21/15, 12:01 PM, Carl Roberts wrote: OK - I figured out the logging. Here is the logging output plus the console output and the stack trace: main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/' [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/commons-logging-1.2.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/servlet-api.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/slf4j-simple-1.7.5.jar' to classloader [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /Users/carlroberts/dev/solr-4.10.3/solr.xml [main] INFO org.apache.solr.core.CoreContainer - New CoreContainer 2050551931 db /Users/carlroberts/dev/solr-4.10.3/[main] INFO org.apache.solr.core.CoreContainer -
Re: Errors using the Embedded Solar Server
g.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/jackson-mapper-asl-1.9.13.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/mahout-collections-1.0.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/mahout-math-0.6.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/clustering/lib/simple-xml-2.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-clustering-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/langid/lib/jsonic-1.2.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/langid/lib/langdetect-1.1-20120112.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-langid-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/commons-beanutils-1.8.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/commons-collections-3.2.1.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/velocity-1.7.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/velocity/lib/velocity-tools-2.0.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/dist/solr-velocity-4.10.3.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.update.SolrIndexConfig - IndexWriter infoStream solr logging is enabled [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrConfig - Using Lucene MatchVersion: 4.10.3 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.Config - Loaded SolrConfig: solrconfig.xml [coreLoadExecutor-5-thread-1] INFO org.apache.solr.schema.IndexSchema - Reading Solr Schema from /Users/carlroberts/dev/solr-4.10.3/db/conf/schema.xml [coreLoadExecutor-5-thread-1] INFO org.apache.solr.schema.IndexSchema - [db] Schema name=example default core name=db solr home=/Users/carlroberts/dev/solr-4.10.3/ db is loaded=false core init failures={} cores=[] Exception in thread "main" org.apache.solr.common.SolrException: No such core: db at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:112) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:38) On 1/21/15, 12:31 PM, Alan Woodward wrote: Ah, OK, you need to include a logging jar in your classpath - the log4j and slf4j-log4j jars in the solr distribution will help here. Once you've got some logging set up, then you should be able to work out what's going wrong! Alan Woodward www.flax.co.uk On 21 Jan 2015, at 16:53, Carl Roberts wrote: So far I have not been able to get the logging to work - here is what I get in the console prior to the exception: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. db /Users/carlroberts/dev/solr-4.10.3/ false {} [] /Users/carlroberts/dev/solr-4.10.3/ On 1/21/15, 11:50 AM, Alan Woodward wrote: That certainly looks like it ought to work. Is there log output that you could show us as well? Alan Woodward www.flax.co.uk On 21 Jan 2015, at 16:09, Carl Roberts wrote: Hi, I have downloaded the code and documentation for Solr version 4.10.3. I am trying to follow SolrJ Wiki guide and I am running into errors. The latest error is this one: Exception in thread "main" org.apache.solr.common.SolrException: No such cor
Is Solr a good candidate to index 100s of nodes in one XML file?
Hi, Is Solr a good candidate to index 100s of nodes in one XML file? I have an RSS feed XML file that has 100s of nodes with several elements in each node that I have to index, so I was planning to parse the XML with Stax and extract the data from each node and add it to Solr. There will always be only one one file to start with and then a second file as the RSS feeds supplies updates. I want to return certain fields of each node when I search certain fields of the same node. Is Solr overkill in this case? Should I just use Lucene instead? Regards, Joe
Re: Errors using the Embedded Solar Server
Ah - OK - let me try that. BTW - I applied the fix from the bug link you gave me to log the errors and I am now at least getting the actual errors: *default core name=db solr home=/Users/carlroberts/dev/solr-4.10.3/ db is loaded=false core init failures={db=org.apache.solr.core.CoreContainer$CoreLoadFailure@4d351f9b} cores=[] Exception in thread "main" org.apache.solr.common.SolrException: SolrCore 'db' is not available due to init failure: JVM Error creating core [db]: org/apache/lucene/queries/function/ValueSource at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:749) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:110) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54) at solr.Test.main(Test.java:38) Caused by: org.apache.solr.common.SolrException: JVM Error creating core [db]: org/apache/lucene/queries/function/ValueSource at org.apache.solr.core.CoreContainer.create(CoreContainer.java:508) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:255) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:249) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/queries/function/ValueSource at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:484) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:521) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:517) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:81) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486) at org.apache.solr.schema.IndexSchema.(IndexSchema.java:166) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:489) ... 6 more Caused by: java.lang.ClassNotFoundException: org.apache.lucene.queries.function.ValueSource at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 21 more * On 1/21/15, 7:32 PM, Shawn Heisey wrote: On 1/21/2015 5:16 PM, Carl Roberts wrote: BTW - it seems that is very hard to get started with the Embedded server. The doc is out of date. The code seems to be untested and buggy. On 1/21/15, 7:15 PM, Carl Roberts wrote: HmmmIt looks like FutureTask is calling setException(Throwable t) with this exception which is not making it to the console. What I don't understand is why it is throwing that exception. I made sure that I added lucene-queries-4.10.3.jar file to the classpath by adding it to the solr home directory. See the new tracing: I'm pretty sure that all the lucene jars need to be available *before* Solr reaches the point in the log that you have quoted, where it adds jars from ${solr.solr.home}/lib. This would be the same location where the solrj and solr-core jars live. The only kind of jars that should be in the solr home lib directory are extra jars for extra features that you might specify in schema.xml (or some places in solrconfig.xml), like the ICU analysis jars, tika, mysql, etc. Thanks, Shawn
Re: Errors using the Embedded Solar Server
ngInfoStream - [IW][main]: now flush at close [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: start flush: applyAllDeletes=true [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: index before flush [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: startFullFlush [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: main finishFullFlush success=true [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: apply all deletes during flush [main] INFO org.apache.solr.update.LoggingInfoStream - [BD][main]: prune sis=segments_3: minGen=9223372036854775807 packetCount=0 [main] INFO org.apache.solr.update.LoggingInfoStream - [CMS][main]: now merge [main] INFO org.apache.solr.update.LoggingInfoStream - [CMS][main]: index: [main] INFO org.apache.solr.update.LoggingInfoStream - [CMS][main]: no more merges pending; now return [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: waitForMerges [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: waitForMerges done [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: start [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: enter lock [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: now prepare [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: prepareCommit: flush [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: index before flush [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: startFullFlush [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: apply all deletes during flush [main] INFO org.apache.solr.update.LoggingInfoStream - [BD][main]: prune sis=segments_3: minGen=9223372036854775807 packetCount=0 [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: main finishFullFlush success=true [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: startCommit(): start [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: startCommit index= changeCount=4 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: done all syncs: [] [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: pendingCommit != null [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: now checkpoint "" [0 segments ; isCommit = true] [main] INFO org.apache.solr.core.SolrCore - SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(MMapDirectory@/Users/carlroberts/dev/solr-4.10.3/db/data/index lockFactory=NativeFSLockFactory@/Users/carlroberts/dev/solr-4.10.3/db/data/index; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} commit{dir=NRTCachingDirectory(MMapDirectory@/Users/carlroberts/dev/solr-4.10.3/db/data/index lockFactory=NativeFSLockFactory@/Users/carlroberts/dev/solr-4.10.3/db/data/index; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_5,generation=5} [main] INFO org.apache.solr.core.SolrCore - newest commit generation = 5 [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: deleteCommits: now decRef commit "segments_4" [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: delete "segments_4" [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: 0 msec to checkpoint [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: wrote segments file "segments_5" [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: took 5.4 msec [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: commit: done [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: rollback [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: all running merges have aborted [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: rollback: done finish merges [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: abort [main] INFO org.apache.solr.update.LoggingInfoStream - [DW][main]: done abort; abortedFiles=[] success=true [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: rollback: infos= [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: now checkpoint "" [0 segments ; isCommit = false] [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: 0 msec to checkpoint [main] INFO org.apache.solr.core.SolrCore - [db] Closing main searcher on request. [main] INFO org.apache.solr.core.CachingDirectoryFactory - Closing NRTCachingDirectoryFactory - 2 directories currently being tracked [main] INFO org.apache.solr.core.CachingDirectoryFactory - looking to close /Users/carlroberts/dev/solr-4.10.3/db/data/index [CachedDir<>] [main] INFO org.apache.solr.core.CachingDirectoryFactory - Closing directory: /Users/carlroberts/dev/solr-4.10.3/db/data/index [main] INFO org.apache.solr.core.CachingDirectory
Re: Errors using the Embedded Solar Server
Got it all working...:) I just replaced the solrconfig.xml and schema.xml files that I was using with the ones from collection1 in one of the examples. I had modified those files to remove certain sections which I thought were not needed and apparently I don't understand those files very well yet...:) Many thanks, Joe On 1/21/15, 8:47 PM, Carl Roberts wrote: Hi Shawn, Many thanks for all your help. Moving the lucene JARs from solr.solr.home/lib to the same classpath directory as the solr JARs plus adding a bunch more dependency JAR files and most of the files from the collection1/conf directory - these ones to be exact, has me a lot closer to my goal: rw-r--r-- 1 carlroberts staff 38 Jan 21 20:41 _rest_managed.json -rw-r--r-- 1 carlroberts staff 56 Jan 21 20:41 _schema_analysis_stopwords_english.json -rw-r--r-- 1 carlroberts staff 4041 Dec 10 00:37 currency.xml -rw-r--r-- 1 carlroberts staff 1386 Dec 10 00:37 elevate.xml drwxr-xr-x 41 carlroberts staff 1394 Dec 10 00:37 lang -rw-r--r-- 1 carlroberts staff894 Dec 10 00:37 protwords.txt -rw-r--r--@ 1 carlroberts staff 62063 Jan 21 13:02 schema.xml -rw-r--r--@ 1 carlroberts staff 76821 Jan 21 13:03 solrconfig.xml -rw-r--r-- 1 carlroberts staff 16 Dec 10 00:37 spellings.txt -rw-r--r-- 1 carlroberts staff795 Dec 10 00:37 stopwords.txt -rw-r--r-- 1 carlroberts staff 1148 Dec 10 00:37 synonyms.txt I am now getting this: [main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/' [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/commons-logging-1.2.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/servlet-api.jar' to classloader [main] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/lib/slf4j-simple-1.7.5.jar' to classloader [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /Users/carlroberts/dev/solr-4.10.3/solr.xml [main] INFO org.apache.solr.core.CoreContainer - New CoreContainer 139145087 [main] INFO org.apache.solr.core.CoreContainer - Loading cores into CoreContainer [instanceDir=/Users/carlroberts/dev/solr-4.10.3/] [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting socketTimeout to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting urlScheme to: null [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting connTimeout to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxConnectionsPerHost to: 20 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting corePoolSize to: 0 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maximumPoolSize to: 2147483647 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxThreadIdleTime to: 5 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting sizeOfQueue to: -1 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting fairnessPolicy to: false [main] INFO org.apache.solr.update.UpdateShardHandler - Creating UpdateShardHandler HTTP client with params: socketTimeout=0&connTimeout=0&retry=false [main] INFO org.apache.solr.logging.LogWatcher - SLF4J impl is org.slf4j.impl.SimpleLoggerFactory [main] INFO org.apache.solr.logging.LogWatcher - No LogWatcher configured [main] INFO org.apache.solr.core.CoreContainer - Host Name: null [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/Users/carlroberts/dev/solr-4.10.3/db/' [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrConfig - Adding specified lib dirs to ClassLoader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-core-0.7.2.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/apache-mime4j-dom-0.7.2.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/aspectjrt-1.6.11.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/solr-4.10.3/contrib/extraction/lib/bcmail-jdk15-1.45.jar' to classloader [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/Users/carlroberts/dev/
Is there a way to pass in proxy settings to Solr?
Hi, Is there a way to pass in proxy settings to Solr? The reason that I am asking this question is that I am trying to run the DIH RSS example, and it is not working when I try to import the RSS feed URL because the code in Solr comes back with an unknown host exception due to the proxy that we use at work. If I use the curl tool and the environment variable http_proxy to access the RSS feed directly it works, but it appears Solr does not use that environment variable because it is throwing this error: 39642 [Thread-15] ERROR org.apache.solr.handler.dataimport.URLDataSource – Exception thrown while getting data java.net.UnknownHostException: rss.slashdot.org at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at sun.net.NetworkClient.doConnect(NetworkClient.java:175) at sun.net.www.http.HttpClient.openServer(HttpClient.java:432) at sun.net.www.http.HttpClient.openServer(HttpClient.java:527) at sun.net.www.http.HttpClient.(HttpClient.java:211) at sun.net.www.http.HttpClient.New(HttpClient.java:308) at sun.net.www.http.HttpClient.New(HttpClient.java:326) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:996) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:850) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1300) at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:98) at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:42) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Thanks in advance, Joe
Re: Is Solr a good candidate to index 100s of nodes in one XML file?
Thanks. I am looking at the RSS DIH example right now. On 1/21/15, 3:15 PM, Alexandre Rafalovitch wrote: Solr is just fine for this. It even ships with an example of how to read an RSS file under the DIH directory. DIH is also most likely what you will use for the first implementation. Don't need to worry about Stax or anything, unless your file format is very weird or has overlapping namespaces (DIH XML parser does not care about namespaces). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 21 January 2015 at 14:53, Carl Roberts wrote: Hi, Is Solr a good candidate to index 100s of nodes in one XML file? I have an RSS feed XML file that has 100s of nodes with several elements in each node that I have to index, so I was planning to parse the XML with Stax and extract the data from each node and add it to Solr. There will always be only one one file to start with and then a second file as the RSS feeds supplies updates. I want to return certain fields of each node when I search certain fields of the same node. Is Solr overkill in this case? Should I just use Lucene instead? Regards, Joe
Re: Is Solr a good candidate to index 100s of nodes in one XML file?
Thanks for the input. I think one benefit of using Solr is also that I can provide a REST API to search the indexed records. Regards, Joe On 1/21/15, 3:17 PM, Shawn Heisey wrote: On 1/21/2015 12:53 PM, Carl Roberts wrote: Is Solr a good candidate to index 100s of nodes in one XML file? I have an RSS feed XML file that has 100s of nodes with several elements in each node that I have to index, so I was planning to parse the XML with Stax and extract the data from each node and add it to Solr. There will always be only one one file to start with and then a second file as the RSS feeds supplies updates. I want to return certain fields of each node when I search certain fields of the same node. Is Solr overkill in this case? Should I just use Lucene instead? Effectively, Solr *is* Lucene. You edit configuration files instead of writing Lucene code, because Solr is a fully customizable search server, not a programming API. That also means that it's not as flexible as Lucene ... but it's a lot easier. If you're capable of writing Lucene code, chances are that you'll be able to write an application that is highly tailored to your situation that will have better performance than Solr ... but you'll be writing the entire program yourself. Solr lets you install an existing program and just change the configuration. Thanks, Shawn
How do you query a sentence composed of multiple words in a description field?
Hi, How do you query a sentence composed of multiple words in a description field? I want to search for sentence "Oracle Fusion Middleware" but when I try the following search query in curl, I get nothing: curl "http://localhost:8983/solr/nvd-rss/select?q=summary:Oracle Fusion Middleware&wt=xml&indent=true" If I actually try using "Oracle+Fusion+Middleware" I get hits with Oracle or Fusion or Middleware but not just the ones with the string "Oracle Fusion Middleware". This is the response: 0 1 true summary:Oracle Fusion Middleware xml CVE-2014-6526 Unspecified vulnerability in the Oracle Directory Server Enterprise Edition component in Oracle Fusion Middleware 7.0 allows remote attackers to affect integrity via unknown vectors related to Admin Console. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6526 1491039690408591361 CVE-2014-6548 Unspecified vulnerability in the Oracle SOA Suite component in Oracle Fusion Middleware 11.1.1.7 allows local users to affect confidentiality, integrity, and availability via vectors related to B2B Engine. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6548 1491039690410688513 CVE-2014-6580 Unspecified vulnerability in the Oracle Reports Developer component in Oracle Fusion Middleware 11.1.1.7 and 11.1.2.2 allows remote attackers to affect integrity via unknown vectors. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6580 149103969042432 CVE-2014-6594 Unspecified vulnerability in the Oracle iLearning component in Oracle iLearning 6.0 and 6.1 allows remote attackers to affect confidentiality via unknown vectors related to Learner Pages. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6594 1491039690435854337 CVE-2015-0372 Unspecified vulnerability in the Oracle Containers for J2EE component in Oracle Fusion Middleware 10.1.3.5 allows remote attackers to affect confidentiality via unknown vectors. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0372 1491039690456825857 CVE-2015-0376 Unspecified vulnerability in the Oracle WebCenter Content component in Oracle Fusion Middleware 11.1.1.8.0 allows remote attackers to affect integrity via unknown vectors related to Content Server. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0376 1491039690458923008 CVE-2015-0420 Unspecified vulnerability in the Oracle Forms component in Oracle Fusion Middleware 11.1.1.7 and 11.1.2.2 allows remote attackers to affect confidentiality via unknown vectors related to Forms Services. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0420 1491039690481991681 CVE-2015-0436 Unspecified vulnerability in the Oracle iLearning component in Oracle iLearning 6.0 and 6.1 allows remote attackers to affect confidentiality via unknown vectors related to Login. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0436 1491039690488283137 CVE-2014-6525 Unspecified vulnerability in the Oracle Web Applications Desktop Integrator component in Oracle E-Business Suite 11.5.10.2, 12.0.6, 12.1.3, 12.2.2, 12.2.3, and 12.2.4 allows remote authenticated users to affect integrity via unknown vectors related to Templates. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6525 1491039690408591360 CVE-2014-6556 Unspecified vulnerability in the Oracle Applications DBA component in Oracle E-Business Suite 11.5.10.2, 12.0.6, 12.1.3, 12.2.2, 12.2.3, and 12.2.4 allows remote authenticated users to affect confidentiality, integrity, and availability via vectors related to AD_DDL. name="link">http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6556 1491039690412785664
Re: How do you query a sentence composed of multiple words in a description field?
Hi Walter, If I try this from my Mac shell: curl http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary:"Oracle Fusion" I don't get a response. If I try this, it works!: curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=name:Oracle"; So I think the entire curl url needs to be in quotes in the command line and my problem is that I do not know how to put the url in quotes and then the field value in quotes inside that. BTW - If I try the first URL from a browser, it works just fine. Any suggestions? On 1/22/15, 5:54 PM, Walter Underwood wrote: Your query is this: summary:Oracle Fusion Middleware That searches for “Oracle” in the summary field and “Fusion” and “Middleware” in whatever your default field is. You want: summary:”Oracle Fusion Middleware” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 22, 2015, at 2:47 PM, Carl Roberts wrote: Hi, How do you query a sentence composed of multiple words in a description field? I want to search for sentence "Oracle Fusion Middleware" but when I try the following search query in curl, I get nothing: curl "http://localhost:8983/solr/nvd-rss/select?q=summary:Oracle Fusion Middleware&wt=xml&indent=true" If I actually try using "Oracle+Fusion+Middleware" I get hits with Oracle or Fusion or Middleware but not just the ones with the string "Oracle Fusion Middleware". This is the response: 0 1 true summary:Oracle Fusion Middleware xml CVE-2014-6526 Unspecified vulnerability in the Oracle Directory Server Enterprise Edition component in Oracle Fusion Middleware 7.0 allows remote attackers to affect integrity via unknown vectors related to Admin Console. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6526 1491039690408591361 CVE-2014-6548 Unspecified vulnerability in the Oracle SOA Suite component in Oracle Fusion Middleware 11.1.1.7 allows local users to affect confidentiality, integrity, and availability via vectors related to B2B Engine. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6548 1491039690410688513 CVE-2014-6580 Unspecified vulnerability in the Oracle Reports Developer component in Oracle Fusion Middleware 11.1.1.7 and 11.1.2.2 allows remote attackers to affect integrity via unknown vectors. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6580 149103969042432 CVE-2014-6594 Unspecified vulnerability in the Oracle iLearning component in Oracle iLearning 6.0 and 6.1 allows remote attackers to affect confidentiality via unknown vectors related to Learner Pages. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6594 1491039690435854337 CVE-2015-0372 Unspecified vulnerability in the Oracle Containers for J2EE component in Oracle Fusion Middleware 10.1.3.5 allows remote attackers to affect confidentiality via unknown vectors. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0372 1491039690456825857 CVE-2015-0376 Unspecified vulnerability in the Oracle WebCenter Content component in Oracle Fusion Middleware 11.1.1.8.0 allows remote attackers to affect integrity via unknown vectors related to Content Server. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0376 1491039690458923008 CVE-2015-0420 Unspecified vulnerability in the Oracle Forms component in Oracle Fusion Middleware 11.1.1.7 and 11.1.2.2 allows remote attackers to affect confidentiality via unknown vectors related to Forms Services. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0420 1491039690481991681 CVE-2015-0436 Unspecified vulnerability in the Oracle iLearning component in Oracle iLearning 6.0 and 6.1 allows remote attackers to affect confidentiality via unknown vectors related to Login. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2015-0436 1491039690488283137 CVE-2014-6525 Unspecified vulnerability in the Oracle Web Applications Desktop Integrator component in Oracle E-Business Suite 11.5.10.2, 12.0.6, 12.1.3, 12.2.2, 12.2.3, and 12.2.4 allows remote authenticated users to affect integrity via unknown vectors related to Templates. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6525 1491039690408591360 CVE-2014-6556 Unspecified vulnerability in the Oracle Applications DBA component in Oracle E-Business Suite 11.5.10.2, 12.0.6, 12.1.3, 12.2.2, 12.2.3, and 12.2.4 allows remote authenticated users to affect confidentiality, integrity, and availability via vectors related to AD_DDL. http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2014-6556 1491039690412785664
Re: How do you query a sentence composed of multiple words in a description field?
Thanks Shawn - I tried this but it does not work. I don't even get a response from curl when I try that format and when I look at the logging on the console for Jetty I don't see anything new - it seems that the request is not even making it to the server. On 1/22/15, 6:43 PM, Shawn Heisey wrote: On 1/22/2015 4:31 PM, Carl Roberts wrote: Hi Walter, If I try this from my Mac shell: curl http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary:"Oracle Fusion" I don't get a response. Quotes are a special character to the shell on your mac, and get removed from what the curl command sees. You'll need to put the whole thing in quotes (so that characters like & are not interpreted by the shell) and then escape the quotes that you want to actually be handled by curl: curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary:\"Oracle Fusion\"" Thanks, Shawn
Re: How do you query a sentence composed of multiple words in a description field?
Thanks Erick, I think I am going to start using the browser for testing...:) Perhaps also a REST client for the Mac. Regards, Joe On 1/22/15, 6:56 PM, Erick Erickson wrote: Have you considered using the admin/query form? Lots of escaping is done there for you. Once you have the form of the query down and know what to expect, it's probably easier to enter "escaping hell" with curl and the like And what is your schema definition for the field in question? the admin/analysis page can help a lot here. Best, Erick On Thu, Jan 22, 2015 at 3:51 PM, Carl Roberts wrote: Thanks Shawn - I tried this but it does not work. I don't even get a response from curl when I try that format and when I look at the logging on the console for Jetty I don't see anything new - it seems that the request is not even making it to the server. On 1/22/15, 6:43 PM, Shawn Heisey wrote: On 1/22/2015 4:31 PM, Carl Roberts wrote: Hi Walter, If I try this from my Mac shell: curl http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary :"Oracle Fusion" I don't get a response. Quotes are a special character to the shell on your mac, and get removed from what the curl command sees. You'll need to put the whole thing in quotes (so that characters like & are not interpreted by the shell) and then escape the quotes that you want to actually be handled by curl: curl "http://localhost:8983/solr/nvd-rss/select?wt=json&indent=true&q=summary :\"Oracle Fusion\"" Thanks, Shawn
Re: Is Solr a good candidate to index 100s of nodes in one XML file?
I got the RSS DIH example to work with my own RSS feed and it works great - thanks for the help. On 1/22/15, 11:20 AM, Carl Roberts wrote: Thanks. I am looking at the RSS DIH example right now. On 1/21/15, 3:15 PM, Alexandre Rafalovitch wrote: Solr is just fine for this. It even ships with an example of how to read an RSS file under the DIH directory. DIH is also most likely what you will use for the first implementation. Don't need to worry about Stax or anything, unless your file format is very weird or has overlapping namespaces (DIH XML parser does not care about namespaces). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 21 January 2015 at 14:53, Carl Roberts wrote: Hi, Is Solr a good candidate to index 100s of nodes in one XML file? I have an RSS feed XML file that has 100s of nodes with several elements in each node that I have to index, so I was planning to parse the XML with Stax and extract the data from each node and add it to Solr. There will always be only one one file to start with and then a second file as the RSS feeds supplies updates. I want to return certain fields of each node when I search certain fields of the same node. Is Solr overkill in this case? Should I just use Lucene instead? Regards, Joe
Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?
Hi, I have the RSS DIH example working with my own RSS feed - here is the configuration for it. https://nvd.nist.gov/download/nvd-rss.xml"; processor="XPathEntityProcessor" forEach="/RDF/item" transformer="DateFormatTransformer"> commonField="true" /> commonField="true" /> commonField="true" /> commonField="true" /> However, my problem is that I also have to load multiple XML feeds into the same core. Here is one example (there are about 10 of them): http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip Is there any built-in functionality that would allow me to do this? Basically, the use-case is to load and index all the XML ZIP files first, and then check the RSS feed every two hours and update the indexes with any new ones. Regards, Joe
Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?
Hi Alex, If I am understanding this correctly, I can define multiple entities like this? ... How would I trigger loading certain entities during start? How would I trigger loading other entities during update? Is there a way to set an auto-update for certain entities so that I don't have to invoke an update via curl? Where / how do I specify the preImportDeleteQuery to avoid deleting everything upon each update? Is there an example or doc that shows how to do all this? Regards, Joe On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote: You can define both multiple entities in the same file and nested entities if your list comes from an external source (e.g. a text file of URLs). You can also trigger DIH with a name of a specific entity to load just that. You can even pass DIH configuration file when you are triggering the processing start, so you can have different files completely for initial load and update. Though you can just do the same with entities. The only thing to be aware of is that before an entity definition is processed, a delete command is run. By default, it's "delete all", so executing one entity will delete everything but then just populate that one entity's results. You can avoid that by defining preImportDeleteQuery and having a clear identifier on content generated by each entity (e.g. source, either extracted or manually added with TemplateTransformer). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 11:15, Carl Roberts wrote: Hi, I have the RSS DIH example working with my own RSS feed - here is the configuration for it. https://nvd.nist.gov/download/nvd-rss.xml"; processor="XPathEntityProcessor" forEach="/RDF/item" transformer="DateFormatTransformer"> However, my problem is that I also have to load multiple XML feeds into the same core. Here is one example (there are about 10 of them): http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip Is there any built-in functionality that would allow me to do this? Basically, the use-case is to load and index all the XML ZIP files first, and then check the RSS feed every two hours and update the indexes with any new ones. Regards, Joe
Sporadic Socket Timeout Error during Import
Hi, I am using the DIH RSS example and I am running into a sporadic socket timeout error during every 3rd or 4th request. Below is the stack trace. What is the default socket timeout for reads and how can I increase it? 15046 [Thread-17] ERROR org.apache.solr.handler.dataimport.URLDataSource – Exception thrown while getting data java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.read(InputRecord.java:480) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884) at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:98) at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:42) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) 815049 [Thread-17] ERROR org.apache.solr.handler.dataimport.DocBuilder – Exception while processing: nvd-rss document : SolrInputDocument(fields: []):org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url https://nvd.nist.gov/download/nvd-rss.xml Processing Document # 1 at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:115) at org.apache.solr.handler.dataimport.URLDataSource.getData(URLDataSource.java:42) at org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:283) at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:224) at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:204) at org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:416) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:480) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:461) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:152) at java.net.SocketInputStream.read(SocketInputStream.java:122) at sun.security.ssl.InputRecord.readFully(InputRecord.java:442) at sun.security.ssl.InputRecord.read(InputRecord.java:480) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:927) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:884) at sun.security.ssl.AppInputStream.read(AppInputStream.java:102) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) at sun.net.www.protocol.http.HttpURLConnection.g
Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?
OK - Thanks for the doc. Is it possible to just provide an empty value to preImportDeleteQuery to disable the delete prior to import? Will the data still be deleted for each entity during a delta-import instead of full-import? Is there any capability in the handler to unzip an XML file from a URL prior to reading it or can I perhaps hook a custom pre-processing handler? Regards, Joe On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Admin UI has the interface, so you can play there once you define it. You do have to use Curl, there is no built-in scheduler. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 13:29, Carl Roberts wrote: Hi Alex, If I am understanding this correctly, I can define multiple entities like this? ... How would I trigger loading certain entities during start? How would I trigger loading other entities during update? Is there a way to set an auto-update for certain entities so that I don't have to invoke an update via curl? Where / how do I specify the preImportDeleteQuery to avoid deleting everything upon each update? Is there an example or doc that shows how to do all this? Regards, Joe On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote: You can define both multiple entities in the same file and nested entities if your list comes from an external source (e.g. a text file of URLs). You can also trigger DIH with a name of a specific entity to load just that. You can even pass DIH configuration file when you are triggering the processing start, so you can have different files completely for initial load and update. Though you can just do the same with entities. The only thing to be aware of is that before an entity definition is processed, a delete command is run. By default, it's "delete all", so executing one entity will delete everything but then just populate that one entity's results. You can avoid that by defining preImportDeleteQuery and having a clear identifier on content generated by each entity (e.g. source, either extracted or manually added with TemplateTransformer). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 11:15, Carl Roberts wrote: Hi, I have the RSS DIH example working with my own RSS feed - here is the configuration for it. https://nvd.nist.gov/download/nvd-rss.xml"; processor="XPathEntityProcessor" forEach="/RDF/item" transformer="DateFormatTransformer"> However, my problem is that I also have to load multiple XML feeds into the same core. Here is one example (there are about 10 of them): http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip Is there any built-in functionality that would allow me to do this? Basically, the use-case is to load and index all the XML ZIP files first, and then check the RSS feed every two hours and update the indexes with any new ones. Regards, Joe
Re: Is it possible to read multiple RSS feeds and XML Zip file feeds with DIH into one core?
Excellent - thanks Shalin. But how does delta-import work? Does it do a clean also? Does it require a unique Id? Does it update existing records and only add when necessary? And, how would I go about unzipping the content from a URL to then import the unzipped XML? Is the recommended way to extend the URLDataSource class or is there any built-in logic to plug in pre-processing handlers? And, On 1/23/15, 2:39 PM, Shalin Shekhar Mangar wrote: If you add clean=false as a parameter to the full-import then deletion is disabled. Since you are ingesting RSS there is no need for deletion at all I guess. On Fri, Jan 23, 2015 at 7:31 PM, Carl Roberts wrote: OK - Thanks for the doc. Is it possible to just provide an empty value to preImportDeleteQuery to disable the delete prior to import? Will the data still be deleted for each entity during a delta-import instead of full-import? Is there any capability in the handler to unzip an XML file from a URL prior to reading it or can I perhaps hook a custom pre-processing handler? Regards, Joe On 1/23/15, 1:40 PM, Alexandre Rafalovitch wrote: https://cwiki.apache.org/confluence/display/solr/ Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler Admin UI has the interface, so you can play there once you define it. You do have to use Curl, there is no built-in scheduler. Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 13:29, Carl Roberts wrote: Hi Alex, If I am understanding this correctly, I can define multiple entities like this? ... How would I trigger loading certain entities during start? How would I trigger loading other entities during update? Is there a way to set an auto-update for certain entities so that I don't have to invoke an update via curl? Where / how do I specify the preImportDeleteQuery to avoid deleting everything upon each update? Is there an example or doc that shows how to do all this? Regards, Joe On 1/23/15, 11:24 AM, Alexandre Rafalovitch wrote: You can define both multiple entities in the same file and nested entities if your list comes from an external source (e.g. a text file of URLs). You can also trigger DIH with a name of a specific entity to load just that. You can even pass DIH configuration file when you are triggering the processing start, so you can have different files completely for initial load and update. Though you can just do the same with entities. The only thing to be aware of is that before an entity definition is processed, a delete command is run. By default, it's "delete all", so executing one entity will delete everything but then just populate that one entity's results. You can avoid that by defining preImportDeleteQuery and having a clear identifier on content generated by each entity (e.g. source, either extracted or manually added with TemplateTransformer). Regards, Alex. Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 23 January 2015 at 11:15, Carl Roberts < carl.roberts.zap...@gmail.com> wrote: Hi, I have the RSS DIH example working with my own RSS feed - here is the configuration for it. https://nvd.nist.gov/download/nvd-rss.xml"; processor="XPathEntityProcessor" forEach="/RDF/item" transformer="DateFormatTransformer"> However, my problem is that I also have to load multiple XML feeds into the same core. Here is one example (there are about 10 of them): http://static.nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2014.xml.zip Is there any built-in functionality that would allow me to do this? Basically, the use-case is to load and index all the XML ZIP files first, and then check the RSS feed every two hours and update the indexes with any new ones. Regards, Joe
Fwd: Need Help with custom ZIPURLDataSource class
Hi, I created a custom ZIPURLDataSource class to unzip the content from an http URL for an XML ZIP file and it seems to be working (at least I have no errors), but no data is imported. Here is my configuration in rss-data-config.xml: https://nvd.nist.gov/feeds/xml/cve/nvdcve-2.0-2002.xml.zip"; processor="XPathEntityProcessor" forEach="/nvd/entry" transformer="DateFormatTransformer"> Attached is the ZIPURLDataSource.java file. It actually unzips and saves the raw XML to disk, which I have verified to be a valid XML file. The file has one or more entries (here is an example): http://scap.nist.gov/schema/scap-core/0.1"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:patch="http://scap.nist.gov/schema/patch/0.1"; xmlns:vuln="http://scap.nist.gov/schema/vulnerability/0.4"; xmlns:cvss="http://scap.nist.gov/schema/cvss-v2/0.2"; xmlns:cpe-lang="http://cpe.mitre.org/language/2.0"; xmlns="http://scap.nist.gov/schema/feed/vulnerability/2.0"; pub_date="2015-01-10T05:37:05" xsi:schemaLocation="http://scap.nist.gov/schema/patch/0.1 http://nvd.nist.gov/schema/patch_0.1.xsd http://scap.nist.gov/schema/scap-core/0.1 http://nvd.nist.gov/schema/scap-core_0.1.xsd http://scap.nist.gov/schema/feed/vulnerability/2.0 http://nvd.nist.gov/schema/nvd-cve-feed_2.0.xsd"; nvd_xml_version="2.0"> http://nvd.nist.gov/";> cpe:/o:freebsd:freebsd:2.2.8 cpe:/o:freebsd:freebsd:1.1.5.1 cpe:/o:freebsd:freebsd:2.2.3 cpe:/o:freebsd:freebsd:2.2.2 cpe:/o:freebsd:freebsd:2.2.5 cpe:/o:freebsd:freebsd:2.2.4 cpe:/o:freebsd:freebsd:2.0.5 cpe:/o:freebsd:freebsd:2.2.6 cpe:/o:freebsd:freebsd:2.1.6.1 cpe:/o:freebsd:freebsd:2.0.1 cpe:/o:freebsd:freebsd:2.2 cpe:/o:freebsd:freebsd:2.0 cpe:/o:openbsd:openbsd:2.3 cpe:/o:freebsd:freebsd:3.0 cpe:/o:freebsd:freebsd:1.1 cpe:/o:freebsd:freebsd:2.1.6 cpe:/o:openbsd:openbsd:2.4 cpe:/o:bsdi:bsd_os:3.1 cpe:/o:freebsd:freebsd:1.0 cpe:/o:freebsd:freebsd:2.1.7 cpe:/o:freebsd:freebsd:1.2 cpe:/o:freebsd:freebsd:2.1.5 cpe:/o:freebsd:freebsd:2.1.7.1 CVE-1999-0001 1999-12-30T00:00:00.000-05:00 2010-12-16T00:00:00.000-05:00 5.0 NETWORK LOW NONE NONE NONE PARTIAL http://nvd.nist.gov 2004-01-01T00:00:00.000-05:00 OSVDB http://www.osvdb.org/5707"; xml:lang="en">5707 CONFIRM http://www.openbsd.org/errata23.html#tcpfix"; xml:lang="en">http://www.openbsd.org/errata23.html#tcpfix ip_input.c in BSD-derived TCP/IP implementations allows remote attackers to cause a denial of service (crash or hang) via crafted packets. Here is the curl command: curl http://127.0.0.1:8983/solr/nvd-rss/dataimport?command=full-import And here is the output from the console for Jetty: main{StandardDirectoryReader(segments_1:1:nrt)} 2407 [coreLoadExecutor-5-thread-1] INFO org.apache.solr.core.CoreContainer registering core: nvd-rss 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter user.dir=/Users/carlroberts/dev/solr-4.10.3/example 2409 [main] INFO org.apache.solr.servlet.SolrDispatchFilter SolrDispatchFilter.init() done 2431 [main] INFO org.eclipse.jetty.server.AbstractConnector Started SocketConnector@0.0.0.0:8983 2450 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore [nvd-rss] webapp=null path=null params={event=firstSearcher&q=static+firstSearcher+warming+in+solrconfig.xml&distrib=false} hits=0 status=0 QTime=43 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore QuerySenderListener done. 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index for spellchecker: default 2451 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SpellCheckComponent Loading spell index for spellchecker: wordbreak 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.handler.component.SuggestComponent Loading suggester index for: mySuggester 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.spelling.suggest.SolrSuggester reload() 2452 [searcherExecutor-6-thread-1] INFO org.apache.solr.spelling.suggest.SolrSuggester build() 2459 [searcherExecutor-6-thread-1] INFO org.apache.solr.core.SolrCore [nvd-rss] Registered new searcher Searcher@df9e84e[nvd-rss] main{StandardDirectoryReader(segments_1:1:nrt)} 8371 [qtp1640586218-17] INFO org.apache.solr.handler.dataimport.DataImporter Loading DIH Configuration: rss-data-config.xml 8379 [qtp1640586218-17] INFO org.apache.solr.handler.dataimport.DataImporter Data Configuration loaded successfully 8383 [Thread-15] INFO org.apache.solr.handler.dataimport.DataImporter Starting Full Import 8384 [qtp1640586218-17] INFO org.apache.solr.core.SolrCore [nvd-rss] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=15 8396 [Thread-15] INFO org.apache.solr.handler.dataimport.SimplePropertiesWriter Read dataimport.properties 23431 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommi