You can crawl any website you want and generate your own index with Apache Nutch. You can find instructions on how to do this here.
https://wiki.apache.org/nutch/NutchTutorial <https://wiki.apache.org/nutch/NutchTutorial> Regards, Javier > On 23 Feb 2016, at 12:06, nishtala <[email protected]> wrote: > > Hi Javier, > > One more question, the index size is 12g. > the memory on my machine is 8g. > is there any way to use a different type of index? like the wikipedia index? > if yes, what changes do i need to do? > > Best, > Rajiv > > On 2016-02-23 12:04, Javier Picorel wrote: >> Hi, >> >> I’ve never run both on the same machine. Try run it on different machines. >> >> If you are violating the 99-th percentile latency, you can change the number >> of >> concurrent clients with the SCALE parameter. >> >> Regards, >> Javier >> >>> On 23 Feb 2016, at 07:28, nishtala <[email protected]> wrote: >>> >>> Hi Javier, >>> >>> >>> >>> On 2016-02-23 00:21, Javier Picorel wrote: >>>> Hi Nishtala, >>>> >>>> I see two problems. >>>> >>>> 1) Are the client and server machines in the 0.0.0.X subnet? For some >>>> reason, it cannot >>>> open a TCP/IP connection on that address. Can you verify that this is a >>>> valid subnet? >>> Could it be because the server-client are on the same machine? >>>> 2) Don’t forget to set the $IP global variable. It has to point to the IP >>>> of the server machine. >>> I did set the IP variable. >>>> Hope this helps! >>>> >>>> Regards, >>>> Javier >>>> >>>>> On 22 Feb 2016, at 16:37, nishtala <[email protected]> wrote: >>>>> >>>>> <server.sh> >>> >>> WARNING / LEGAL TEXT: This message is intended only for the use of the >>> individual or entity to which it is addressed and may contain >>> information which is privileged, confidential, proprietary, or exempt >>> from disclosure under applicable law. If you are not the intended >>> recipient or the person responsible for delivering the message to the >>> intended recipient, you are strictly prohibited from disclosing, >>> distributing, copying, or in any way using this message. If you have >>> received this communication in error, please notify the sender and >>> destroy and delete any copies you may have received. >>> >>> http://www.bsc.es/disclaimer > > > WARNING / LEGAL TEXT: This message is intended only for the use of the > individual or entity to which it is addressed and may contain > information which is privileged, confidential, proprietary, or exempt > from disclosure under applicable law. If you are not the intended > recipient or the person responsible for delivering the message to the > intended recipient, you are strictly prohibited from disclosing, > distributing, copying, or in any way using this message. If you have > received this communication in error, please notify the sender and > destroy and delete any copies you may have received. > > http://www.bsc.es/disclaimer
