Re: [Dspace-tech] sitemaps in 1.8
The sitemap generated does not include any links to bitstreams. As I understand it: sitemaps list all the links a crawler should digest, essentially saying these links but no others. If that is how it works crawlers will not index bitstreams when using the standard generated sitemap. As you can see http://dataspace.princeton.edu/jspui/sitemap?map=0 mentions http://dataspace.princeton.edu/jspui/handle/88435/dsp01k643b120w but not its bitstream http://dataspace.princeton.edu/jspui/bitstream/88435/dsp01n583xv02q/1/Atuei-HabanaCuba-Nov1927.pdf In fact running zcat sitemap*.xml.gz | fgrep bitstream finds nothing on the server Do I miss something here ? Monika On 11/5/14, 9:41 AM, Monika Mevenkamp wrote: /jspui/htmlmap works great and /jspui/sitemap as well for the xml version Thanks Monika On 11/4/14, 2:25 PM, Claudia Jürgen wrote: Hello Monika, the link to access the sitemaps is view-source:http://asdspace300l.princeton.edu/jspui/htmlmaphttp://asdspace300l.princeton.edu/jspui/htmlmap which is contained as a relative link jspui/htmlmap in your footer. Hope this helps Claudia Am 04.11.2014 20:14, schrieb Monika Mevenkamp: I generated sitemaps with the dspace generate-sitemap command, which created lots of files in /dspace/stemaps. But I am not sure, which url to use to get to these generated files. I used Apache to Alias /sitemap /dspace/sitemaps so you can have a look at the generated files as they sit on the file system right HERE http://asdspace300l.princeton.edu/sitemap/ The generated sitemap-index.xml.gz http://asdspace300l.princeton.edu/sitemap/sitemap_index.xml.gz file contains sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9; sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=0/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=1/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap /sitemapindex But a GET http://asdspace300l.princeton.edu/jspui/sitemap?map=1 triggers the computation of sitemap content as opposed to loading of the corresponding file from the sitemap directory. What is the right way to configure this ? Monika -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette:https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette:https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] sitemaps in 1.8
Hi Monika, the sitemap helps spiders discover the site, but is not limiting them from crawling other URLs. robots.txt serves that purpose. Therefore, DSpace feeds the spider a list of all item pages, but spiders will index the links from those pages, too. In our case - bistreams. Recently there were even certain improvements in how bitstreams are listed in item page meta headers based on feedback and best practices from Google Scholar. So you don't have to worry that because you're using a sitemap your bitstreams will be ignored. Regards, ~~helix84 Compulsory reading: DSpace Mailing List Etiquette https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9/Month. Get alerted through email, SMS, voice calls or mobile push notifications. Take corrective actions from your mobile device. http://pubads.g.doubleclick.net/gampad/clk?id=154624111iu=/4140/ostg.clktrk ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] sitemaps in 1.8
/jspui/htmlmap works great and /jspui/sitemap as well for the xml version Thanks Monika On 11/4/14, 2:25 PM, Claudia Jürgen wrote: Hello Monika, the link to access the sitemaps is view-source:http://asdspace300l.princeton.edu/jspui/htmlmaphttp://asdspace300l.princeton.edu/jspui/htmlmap which is contained as a relative link jspui/htmlmap in your footer. Hope this helps Claudia Am 04.11.2014 20:14, schrieb Monika Mevenkamp: I generated sitemaps with the dspace generate-sitemap command, which created lots of files in /dspace/stemaps. But I am not sure, which url to use to get to these generated files. I used Apache to Alias /sitemap /dspace/sitemaps so you can have a look at the generated files as they sit on the file system right HERE http://asdspace300l.princeton.edu/sitemap/ The generated sitemap-index.xml.gz http://asdspace300l.princeton.edu/sitemap/sitemap_index.xml.gz file contains sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9; sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=0/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=1/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap /sitemapindex But a GET http://asdspace300l.princeton.edu/jspui/sitemap?map=1 triggers the computation of sitemap content as opposed to loading of the corresponding file from the sitemap directory. What is the right way to configure this ? Monika -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette:https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
[Dspace-tech] sitemaps in 1.8
I generated sitemaps with the dspace generate-sitemap command, which created lots of files in /dspace/stemaps. But I am not sure, which url to use to get to these generated files. I used Apache to Alias /sitemap /dspace/sitemaps so you can have a look at the generated files as they sit on the file system right HERE http://asdspace300l.princeton.edu/sitemap/ The generated sitemap-index.xml.gz http://asdspace300l.princeton.edu/sitemap/sitemap_index.xml.gz file contains sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9; sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=0/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=1/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap /sitemapindex But a GET http://asdspace300l.princeton.edu/jspui/sitemap?map=1 triggers the computation of sitemap content as opposed to loading of the corresponding file from the sitemap directory. What is the right way to configure this ? Monika -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette
Re: [Dspace-tech] sitemaps in 1.8
Hello Monika, the link to access the sitemaps is view-source:http://asdspace300l.princeton.edu/jspui/htmlmaphttp://asdspace300l.princeton.edu/jspui/htmlmap which is contained as a relative link jspui/htmlmap in your footer. Hope this helps Claudia Am 04.11.2014 20:14, schrieb Monika Mevenkamp: I generated sitemaps with the dspace generate-sitemap command, which created lots of files in /dspace/stemaps. But I am not sure, which url to use to get to these generated files. I used Apache to Alias /sitemap /dspace/sitemaps so you can have a look at the generated files as they sit on the file system right HERE http://asdspace300l.princeton.edu/sitemap/ The generated sitemap-index.xml.gz http://asdspace300l.princeton.edu/sitemap/sitemap_index.xml.gz file contains sitemapindex xmlns=http://www.sitemaps.org/schemas/sitemap/0.9; sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=0/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap sitemaplochttp://asdspace300l.princeton.edu/jspui/sitemap?map=1/loclastmod2014-11-04T13:14:20Z/lastmod/sitemap /sitemapindex But a GET http://asdspace300l.princeton.edu/jspui/sitemap?map=1 triggers the computation of sitemap content as opposed to loading of the corresponding file from the sitemap directory. What is the right way to configure this ? Monika -- Monika Mevenkamp phone: 609-258-4161 123 693 Alexander Street, Princeton University, Princeton, NJ 08544 -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette -- ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette