Re: [Dbpedia-discussion] .bz2 problem

Jona Christopher Sahnwaldt Sun, 21 Apr 2013 12:19:26 -0700

On 21 April 2013 19:46, Ahmed Ktob <kto...@gmail.com> wrote:
> Well, first I should mention that I am using Intellij IDEA within Windows 7,
> I can't try now on Linux because my works on Windows and I haven't enough
> free space ))
>
> Also I am following this tutorial [1] to accomplish the Abstract Extraction.
> I followed it until when it comes to importing data, it didn't work for me
> with the error :
>
> java.lang.IllegalArgumentException: found no directory
> C:\Users\AHMED\Desktop\arwiki/[YYYYMMDD] containing file
> arwiki-[YYYYMMDD]-pages-articles.xml
>
> So I started reading the Import.Scala code and I figured maybe if I changed
> the code :
>
> val tagFile = if (requireComplete) Download.Complete else
> "pages-articles.xml"
> val date = finder.dates(tagFile).last
> val file = finder.file(date, "pages-articles.xml")
>
> to  "pages-articles.xml.bz2" maybe it will work. I did it and it worked (I
> passed this step).


Please pull the latest version from github. Let git overwrite your
changes in Import.scala. Maybe git can merge your changes in pom.xml
(your folder) with the new parmeter (dump file name).

>
> After the answer of Dimitris, I redo my changes and uncomment the source as
> he mentioned in both extraction.abstracts.properties &
> extraction.default.properties but I couldn't pass this step (the same error
> above).
>
> I am using Maven 3.0.4, and to start Maven I just followed the guide :
> clean -> install (on Parent Pom of the DBPedia framework)
> Scala:run (on DBpedia Dump Extraction)
>
> Currently, I want just the default extraction not the abstract, but I can't
> find a guide. Any suggestion ?

https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions

>
> Thank you so much.
>
> Cheers,
> Ahmed.
>
> [1]
> https://github.com/dbpedia/extraction-framework/wiki/Dbpedia-Abstract-Extraction-step-by-step-guide
>
>
> On 21 April 2013 18:19, Jona Christopher Sahnwaldt <j...@sahnwaldt.de> wrote:
>>
>> Ahmed,
>>
>> if things still don't work for you, please tell us exactly what you
>> are trying to do: which Maven launcher? How do you start it? Please
>> attach a copy of the configuration files and Scala files that you
>> edited and a text file containing the complete Maven output.
>>
>> Cheers,
>> JC
>>
>> On 21 April 2013 19:17, Jona Christopher Sahnwaldt <j...@sahnwaldt.de>
>> wrote:
>> > Hi,
>> >
>> > Dimitris is right. Ahmed was referring to Import.scala, but that's
>> > probably not what's causing the problem.
>> >
>> > Ahmed, please try to edit the config file as Dimitris said and the
>> > extraction should work. You only need Import.scala if you want to
>> > extract abstracts.
>> >
>> > Anyway, I just added some code to make Import.scala more flexible. I
>> > also added a new argument in dump/pom.xml: users can now specify the
>> > name of the XML dump file, and Import.scala will automatically unzip
>> > if the suffix is .gz or .bz2.
>> >
>> > If you encouter any problems, let us know.
>> >
>> > Cheers,
>> > JC
>> >
>> > On 21 April 2013 18:08, Jona Christopher Sahnwaldt <j...@sahnwaldt.de>
>> > wrote:
>> >> Hi,
>> >>
>> >> hm, no, sorry, in this case that won't work. The Import class is not
>> >> configurable enough. I think Import.scala can't handle zipped files at
>> >> all, so changing the name won't help either. I'll have a look, maybe I
>> >> can fix this quickly.
>> >>
>> >> Cheers,
>> >> JC
>> >>
>> >> On 21 April 2013 18:00, Dimitris Kontokostas <jimk...@gmail.com> wrote:
>> >>> Hi Ahmed,
>> >>>
>> >>> in the default configuration files you will find the following lines
>> >>> # default:
>> >>> # source=pages-articles.xml
>> >>>
>> >>> # alternatives:
>> >>> # source=pages-articles.xml.bz2
>> >>> # source=pages-articles.xml.gz
>> >>>
>> >>> You should comment / uncomments the ones that suit you
>> >>>
>> >>> Best,
>> >>> Dimitris
>> >>>
>> >>>
>> >>>
>> >>> On Sun, Apr 21, 2013 at 2:24 AM, Ahmed Ktob <kto...@gmail.com> wrote:
>> >>>>
>> >>>> Hello guys,
>> >>>>
>> >>>> Today I was trying to use the extraction framework to extract data
>> >>>> for the
>> >>>> Arabic language. When it comes to finding the file in the download
>> >>>> directory
>> >>>> (dump file), it didn't work, so after a while I figured that a part
>> >>>> of code
>> >>>> from the file Import.scala is written as follow :
>> >>>>
>> >>>> try {
>> >>>> for (language <- languages) {
>> >>>>
>> >>>> val finder = new Finder[File](baseDir, language, "wiki")
>> >>>> val tagFile = if (requireComplete) Download.Complete else
>> >>>> "pages-articles.xml"
>> >>>> val date = finder.dates(tagFile).last
>> >>>>   val file = finder.file(date, "pages-articles.xml")
>> >>>>
>> >>>> I tried to change the name to "pages-articales.xml.bz2" and the
>> >>>> extraction
>> >>>> successfully passed this point.
>> >>>>
>> >>>> My point is, don't you think that we should make the changes I
>> >>>> mentioned
>> >>>> above ? Because when we download the dump file, it comes with ".bz2"
>> >>>> in the
>> >>>> name.
>> >>>>
>> >>>> Best regards,
>> >>>> Ahmed.
>> >>>> --
>> >>>> ------------------------------------------------
>> >>>> Ahmed Ktob
>> >>>> Dr. Taher Moulay University
>> >>>> Department of Computer Science
>> >>>> Saida , Algeria
>> >>>> Tel : +213 554 811 151
>> >>>> ------------------------------------------------
>> >>>>
>> >>>>
>> >>>>
>> >>>> ------------------------------------------------------------------------------
>> >>>> Precog is a next-generation analytics platform capable of advanced
>> >>>> analytics on semi-structured data. The platform includes APIs for
>> >>>> building
>> >>>> apps and a phenomenal toolset for data science. Developers can use
>> >>>> our toolset for easy data analysis & visualization. Get a free
>> >>>> account!
>> >>>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> >>>> _______________________________________________
>> >>>> Dbpedia-discussion mailing list
>> >>>> Dbpedia-discussion@lists.sourceforge.net
>> >>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Kontokostas Dimitris
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Precog is a next-generation analytics platform capable of advanced
>> >>> analytics on semi-structured data. The platform includes APIs for
>> >>> building
>> >>> apps and a phenomenal toolset for data science. Developers can use
>> >>> our toolset for easy data analysis & visualization. Get a free
>> >>> account!
>> >>> http://www2.precog.com/precogplatform/slashdotnewsletter
>> >>> _______________________________________________
>> >>> Dbpedia-discussion mailing list
>> >>> Dbpedia-discussion@lists.sourceforge.net
>> >>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>> >>>
>
>
>
>
> --
> ------------------------------------------------
> Ahmed Ktob
> Dr. Taher Moulay University
> Department of Computer Science
> Saida , Algeria
> Tel : +213 554 811 151
> ------------------------------------------------

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] .bz2 problem

Reply via email to