grazie a tutti per le risposte... io ho bisogno di fare cut'n paste della sezione quote e per qualche strana ragione mi viene impaginato in quel modo li!
Il giorno 29 settembre 2011 16:21, Antonio <antoniopo...@gmail.com> ha scritto: > Anche Scrapy è ottima...con BeautifulSoup non riesci a fare xpath più > complessi. > > > 2011/9/28 Daniel Pyrathon <piro...@gmail.com> > >> Ciao Balan >> >> Ho scritto un piccolo componente che fa il parsing di un file di testo >> (strutturato come vuoi tu) e ne ricava una lista di dizionari. >> >> Nel caso di: >> Serie A >> 18:00 >> Bologna >> Inter >> 1:3 >> 20:45 >> Milan >> Cesena >> 1:0 >> 20:45 >> Napoli >> Fiorentina >> 0:0 >> Serie B >> 18:00 >> Bologna >> Inter >> 1:3 >> 20:45 >> Milan >> Cesena >> 1:0 >> 20:45 >> Napoli >> Fiorentina >> 0:0 >> >> ritornerebbe: >> >> [{'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', >> 'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': >> 'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', >> 'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie A'}, {'teams': >> [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time': >> '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena', >> 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b': >> 'Fiorentina', 'time': '20:45'}], 'title': 'Serie B'}] >> >> Script: >> >> import re >> >> class TeamParser(object): >> >> def __init__(self, file_path): >> >> self._file_path = file_path >> >> self._result = None >> >> >> >> @property >> >> def result(self): >> >> if not self._result: >> >> self._result = self._parse_file() >> >> return self._result >> >> >> >> def _parse_file(self): >> >> file = open(self._file_path, 'r') >> >> current_series = None >> >> self._result = [] >> >> >> >> while True: >> >> line = file.readline().rstrip() >> >> # if file ended, dispose and finish >> >> >> >> if len(line) == 0: >> >> self._result.append(self._parse_team(current_series)) >> >> break >> >> >> >> # If new series, dispose and reset array >> >> if re.findall('Serie\s\w$', line): >> >> if current_series: >> >> self._result.append(self._parse_team(current_series)) >> >> current_series = [] >> >> >> >> # append new line in array >> >> current_series.append(line) >> >> >> >> file.close() >> >> return self._result >> >> >> >> def _parse_team(self, series): >> >> result = {'title' : series[0], 'teams' : []} >> >> index = 1 >> >> number_games = (len(series) -1) / 4 >> >> >> >> for team_index in xrange(number_games): >> >> team = series[index: index+4] >> >> result['teams'].append({'time' : team[0], 'team_a' : team[1], >> 'team_b' : team[2], 'final_score' : team[3]}) >> >> index += 4 >> >> return result >> >> >> x = TeamParser('path del tuo file') >> >> print x.result <-- reuslts >> >> pastebin: http://pastebin.com/JN0pSQ0j >> >> Non penso funzioni con il tuo secondo file, in quel caso fai scraping, >> esistono tante belle librerie tra cui BeautifulSoup che è fantastica e >> interamente scritta in Python >> >> Un saluto, per qualsiasi cosa chiedi pure! >> >> Daniel Pyrathon >> >> Il giorno 28 settembre 2011 10:59, Balan Victor <balan.vict...@gmail.com>ha >> scritto: >> >> in cosa consistono queste cose migliori?grazie >>> >>> Il giorno 28 settembre 2011 08:44, Enrico Franchi < >>> enrico.fran...@gmail.com> ha scritto: >>> >>> Balan Victor wrote: >>>> >>>> penso di essere riuscito a fare cioè che volevo...che ne pensate? >>>>> >>>> >>>> Diciamo che ti ho visto scrivere cose migliori... ;) >>>> >>>> >>>> >>>> >>>> -- >>>> . >>>> ..: -enrico- >>>> >>>> >>>> ______________________________**_________________ >>>> Python mailing list >>>> Python@lists.python.it >>>> http://lists.python.it/**mailman/listinfo/python<http://lists.python.it/mailman/listinfo/python> >>>> >>> >>> >>> _______________________________________________ >>> Python mailing list >>> Python@lists.python.it >>> http://lists.python.it/mailman/listinfo/python >>> >>> >> >> >> -- >> ************* >> >> PirosB3 >> >> http://pirosb3.com >> >> >> _______________________________________________ >> Python mailing list >> Python@lists.python.it >> http://lists.python.it/mailman/listinfo/python >> >> > > _______________________________________________ > Python mailing list > Python@lists.python.it > http://lists.python.it/mailman/listinfo/python > >
_______________________________________________ Python mailing list Python@lists.python.it http://lists.python.it/mailman/listinfo/python